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IV 

ABSTRACT 

The  paper  describes  an  extended  version  of  Backus  Naur  Form 
which  can  be  translated  in  one  pass  to  a  parsing  algorithm.   The  restric- 
tions which  must  be  placed  on  the  BNF  to  achieve  this  end  are  minimal, 
and  it  is  proved  that  they  do  not  alter  the  generative  capacity  of  the 
metalanguage.   The  recursive  descent  parsing  algorithm  produced  operates 
at  about  1000  cards  per  minute  for  typical  languages  on  the  B-5500. 
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1.   INTRODUCTION 

Waxing  academic  interest  in  the  subject  of  syntax  analysis  per 
se  attests  to  the  thoroughness  with  which  this  particular  vein  of  ling- 
uistic lore  has  been  worked.  Notwithstanding  the  light  shed  and  the 
limitations  exposed  by  such  analysis,  writing  programs  for  syntax  analy- 
sis has  remained  an  ad  hoc  affair,  especially  in  the  computer  industry, 
where  the  effects  of  such  advances  should  logically  have  been  felt.   In 
short,  syntax  analysis  has  come  to  and  gone  from  the  academic  scene  with- 
out much  apparent  impact  on  the  industrial  tempo. 

Those  in  the  universities  who  have  been  involved  in  compiler 
writing  have  felt  the  need  for  neat  and  fast  ways  to  write  down  syntax 
recognition  algorithms  and,  in  many  cases,  could  have  had  their  burden 
lightened  by  having  some  of  the  manufacturers  software  packages  available. 
Those  in  industry  likewise  have  not  utilized  much  of  the  work  done  in  the 
universities.   The  language  described  here  is  specifically  an  algorithmic 
language,  but  is  clothed  in  terms  which  make  it  appear  to  be  a  syntax 
description  language.  As  an  algorithmic  language,  it  could  be  added  to 
the  repetoire  of  multi-purpose  languages,  to  be  used  like  any  other  pro- 
gramming device  when  convenient. 

The  language  described  here  (TBNF-T  for  translatable)  is  based 
on  Backus  Naur  Form  (BNF)  but  employs  some  of  the  devices  used  by  Brooker 
and  Morris  [l]  and  by  Kleene  [2]  in  their  respective  systems.  A  minor 
variant  of  what  follows  is  operational  on  a  B-5500  and  has  been  used  in 
the  syntax  phase  of  two  ILLIAC  IV  languages  and  several  support  languages 
at  the  University  of  Illinois,  and  at  the  Burrough's  Corporation. 


In  these  examples  the  syntax  of  a  well  defined  language  of 
greater  complexity  than  Algol  60  have  been  written  in  approximately  one 
man  week  each  and  yield  parsers  of  moderate  speed  (900  to  1800  cards  per 
minute  on  the  B-5500).  As  testimony  to  the  flexibility  of  the  system, 
one  language  (Tranquil  [3]  for  ILLIAC  IV)  was  converted  from  an  earlier 
Floyd  production  scheme  without  a  single  change  to  the  semantics  routines, 

The  method  of  syntax  analysis  is  a  straightforward  top  to 
bottom  recursive  descent  method  based  on  this  considerably  expanded  ver- 
sion of  BNF.   The  machinery  of  TBNF  includes  mnemonic  symbols  for  denoting 
a  sequence  of  strings  without  resorting  to  the  usual  recursive  method, 
(i.e.,  a  list  of  statements  can  be  written  list  <statement>  instead  of 
introducing  the  nonterminal  <list  of  statements>  and  using  a  recursive 
production.)  Several  other  devices  are  employed  to  reduce  the  number  of 
nonterminals  necessary  to  describe  a  language  and  avoid  the  semi-infinite 
verbosity  associated  with  conventional  BNF.  Thus,  for  example,  a  three 
hundred  production  grammar  in  BNF  became  a  forty  production  grammar  in 
the  extended  language.   One  of  the  first  benefits  of  such  compactif ica- 
tion  was  that  programmers  developed  a  very  good  feel  for  the  parsing 
method  and  quickly  learned  to  program  efficiently  in  TBNF. 

This  paper  demonstrates  how  TBNF  can  be  translated,  in  one 
pass,  to  a  parsing  algorithm.   The  method  does  not  detect  ambiguity  or 
any  of  the  other  traditional  properties  (which  turn  out  to  be  of  remark- 
ably little  relevance  in  this  context),  and  not  surprisingly  translates 
the  TBNF  quite  rapidly.   (i.e.,  on  the  B-5500  the  elapse  time  between 
submitting  a  grammar  and  producing  a  compiler  is  of  the  order  of  two 


minutes.)  A  short  turn  around  time  is  of  considerable  advantage  when 
a  new  language  is  being  developed. 

In  summary,  it  is  fair  to  say  that  the  language  to  be  described 
has  reduced  the  syntax  analysis  of  compiling  (as  far  as  it  can  be  divorced 
from  the  semantics)  to  a  trivial  matter,  without  imposing  restrictions 
on  the  semantics. 


2.   THE  SYNTAX  LANGUAGE 

2.1  Automatic  Computation  Based  on  BNF 

One  of  the  surprising  things  about  BNF  is  that,  despite  its 
apparent  simplicity  and  elegance,  it  is  a  very  difficult  language  to 
handle  automatically.   The  question, of  ambiguity  is  typical  of  the  diffi- 
culties one  encounters.  For  example,  it  is  impossible  to  write  an  effec- 
tive procedure  to  detect  the  ambiguities  in  an  arbitrary  context  free 
grammar  (the  proof  is  given  in  Section  ^-.4). 

This  may  not  seem  a  serious  handicap,  for  it  seems  possible 
to  retrieve  some  usefulness  from  BNF  by  dropping  the  requirement  of 
unambiguity.   Even  so,  one  is  faced  with  unusual  difficulties  in  imple- 
menting BNF  in  its  full  generality.   It  is  manifestly  obvious  that  there 
is  a  linear  lower  bound  T(n)  to  the  parsing  time  for  a  string  of  length 
n.   Yet,  no  general  scheme,  to  the  author's  knowledge,  has  succeeded  in 
realizing  linear  parsing  time.   On  the  other  hand,  at  least  three  workers 
(Earley  [5],  Kasami  [6],  and  Younger  [7])  have  established  n  time  bounds 

for  context  free  languages.   Earley  further  strengthens  his  result  to 

2 
n  in  a  large  number  of  cases.  Their  proofs  are  constructive.   One  is 

3 
tempted  to  conclude  that  an  n  bound  is  the  best  one  can  expect  in 

general. 

It  is  not  surprising,  then,  that  most  practical  schemes  place 
some  restrictions  on  the  BNF  accepted  (either  LR(k)  or  LR(m,n)  for  some 
finite,  usually  small  (e.g.,  1  or  2)  values  of  m,  n).   The  method  des- 
cribed in  this  paper  restricts  the  BNF  also.  To  distinguish  this  form, 
we  will  refer  to  it  as  TBNF  (Translatable  BNF). 


There  is  one  other  characteristic  of  BKF  which  makes  it 
inconvenient  to  use  in  a  practical  translator  writing  system  (TWS). 
This  characteristic  is  its  excessive  verbosity.   It  is  quite  unnecessary 
to  define  both  the  nonterminals  <list  of  xs>  and  <x>  when  it  is  clear 
that  the  former  is  obtained  by  compounding  several  of  the  latter.   In  the 
author's  experience,  students  revert  to  a  more  compact  notation  when 
endeavoring  to  decipher  complex  BKF  productions.  TBKF  attempts  to  meet 
this  objection  by  providing  abbreviations  for  the  common  constructs. 

2.2  The  Language  TBKF 

The  syntax  of  TBKF  is  similar  to  BKF  except  for  the  following 
use  of  special  characters  and  words. 

(i)  Kleene  Star. 

<a>  *  s  \ ;~  |  <a>  |  <a>  <a>  |  

that  is,  any  number  n  of  <A>'s  concatenated  together 
(n  >  0),  with  X  representing  the  null  symbol. 

(ii)   Brooker  and  Morris'  question  mark. 
<A>?  =  <&>   |  \ 
to  mean  the  optional  presence  of  some  symbol. 

(iii)   Bracket  construct. 

Square  brackets  [  ,  ]  used  to  delimit  groups  of  symbols;  e.g., 
<X>  :  :  =  <Y>  [  <A>  |  <B>  |  <C>  ]  <Z> 


is  equivalent  to 

<X>      : :  =   <Y>  <dummy>  <7> 

<dummy>  ::=   <A>  |  <B>  |  <0 

Naturally  the  brackets  can  be  nested  to  any  depth,  as  in  the 
following  compact  expression  for  a  boolean  expression: 

<boolean  expression>  ::= 
list  [ 

list  <boolean  primary>  separator   [^  |  and  ]] 
separator   [v  |  or  ] 

(iv)   list  <A>  =     <A>    <A>  * 

(v)   list  <A>  separator  <B>  =  <A>  [  <B>  <A>  ]  * 

(vi)  <any>     Any  symbol  whatsoever. 

(vii)  but  <A>  This  is  normally  used  in  conjunction  with 
<any>  to  express  things  like, 
<comment>   ::=   comment  [<any>  but  ;]* 
(viii)   not  <A>,   ahead  <A>,  back  <A> 

These  are  special  purpose  devices  used  occasionally  for 
error  recovery  and  for  certain  optimization  tricks. 
Their  meaning  should  be  apparent  (for  a  description  see 
Section  2.3  on  implementation). 

The  language  has  a  few  rules  of  formation  (i.e.,  syntax)  which 
is  more  than  minimal  in  the  sense  that  certain  artificial  restrictions 


are  placed  on  the  collections  of  symbols  which  will  be  accepted,  in  order 
to  protect  the  programmer  from  minor  lexical  errors. 

(ix)  <  ,  >,    [  ,    ],  ;  f  /,*>©>#  ,    list,  separator, 
not,  open,  close,  ahead,  but,  <any> 
may  not  be  used  as  terminal  symbols  without  being 
preceded  by  #;  e.g.,  #  <  ,  #  >  ('<",  »>"  also  allowed). 

(x)   Each  production  must  be  terminated  by  ";". 

(xi)   separator  may  not  be  used  without  list. 

(xii)   Only  letters,  digits,  and  spaces  may  appear  between 
"  <  "  and  "  >  ". 

(xiii)  The  null  symbol  \   must  always  be  represented  by  <  >. 

One  of  the  properties  of  BNF  which  TBNF  conceals  is  the 
difference  between  left  and  right  recursiveness;  i.e., 

<arithmetic>  : : =  <arithmetic>  +  <term>   |   <term> 

being  a  left  recursive  production  implies  that  a  term  once  found  is  to 
be  absorbed  into  the  <arithmetic>.  The  right  recursive  production 

<arithmetic>  ::=  <term>'  +  <arithmetic>   |   <term> 

on  the  other  hand,  implies  that  the  whole  <arithmetic>  is  to  be  strung 
out  and  finally  reduced,  term  by  term,  from  the  right  hand  end  of  the 
string. 
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(xiv)  Explicit  instructions  must  be  written  in  TBNF  to  distinguish 
the  two  possible  cases  of  recursion: 

list  <A>  open        similar  to  right  recursion;  implies 

that  the  entire  string  of  <A>'s  be 
assembled  before  reduction 

list  <A>  close       similar  to  left  recursion;  permits  a 

reduction  to  be  made  after  every  <A> 
is  found. 

Kleene  star  (*)  follows  exactly  the  same  convention.   The 
default  conditions  are: 

(a)  action  call  immediately  following  implies  open 

(b)  any  other  construct  implies  closed 

2.3  The  Recognition  Algorithm 

A  top  to  bottom  recursive  descent  method  has  been  chosen  for 
the  syntax  analysis,  for  the  following  reasons: 

(i)  The  algorithm  presented  here  can  recognize  a  large  class 
of  languages.   It  will  be  established  in  Chapter  h   that 
the  method  can  accept  at  least  all  context  free  lang- 
uages (i.e.,  languages  which  have  a  context  free  grammar). 

(ii)  The  interpretation  taken  by  the  algorithm  is  immediately 
obvious  from  the  syntax.   There  are  many  properties  of 
grammars  which  are  well  formulated  theoretically  but 
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which  are  exceedingly  difficult  to  see  in  typical  grammars, 
Being  LR(k)  or  ambiguous,  for  example,  are  properties 
which  are  not  in  any  sense  of  the  word  obvious  in  a 
grammar.   On  the  other  hand,  experience  has  verified 
that  programmers  quickly  develop  a  feeling  for  the  mean- 
ing of  what  they  write  in  TBNF  and  learn  to  program  in  it 
efficiently.   It  should  be  added  that  the  preprocessing 
algorithm  makes  no  attempt  to  detect  ambiguity  or  any 
other  elegant  theoretical  property  of  a  source  grammar. 
This  can  be  heralded  as  a  feature  in  view  of  the  fact 
that  several  grammars  submitted  to  the  system  are  known 
to  be  ambiguous  and  have  been  consciously  written  that 
way  to  achieve  a  specific  programming  advantage. 

(iii)  The  preprocessing  task  for  a  top  to  bottom  parsing  is 
very  simple  to  write.   This  property  is  of  inestimable 
advantage  when  a  programming  language  is  still  being 
formulated  and  a  large  number  of  syntaxs  are  tried  on 
the  way  to  the  final  result. 

(iv)  Top  to  bottom  analysis  facilitates  the  partitioning  of 
the  syntax  of  a  language.  For  example,  the  code  for 
translating  declarations  can  be  written  almost  indepen- 
dently of  the  code  for  translating  arithmetic  expressions. 
The  two  parts  can  then  be  joined  together  to  form  the 
whole  language  without  parasitic  side  effects.   This 


10 


has  "systems"  significance  which  bottom-up  compilers  seem 
to  lack;  viz.,  bottom  up  preprocessors  typically  require 
the  entire  grammar  to  be  processed  each  time. 

2.U  Distinctive  Characteristics  of  This  Algorithm 

Certain  seemingly  trifling  details  bear  some  careful  exploration 
because  of  the  rather  remarkable  effect  they  have  on  the  power  of  the 
algorithm  (as  discussed  in  Sections  k.2  -  h.k). 

(i)  Input  terminal  symbols  are  buffered  and  are  accessed  by 
an  integer  procedure  FETCH,   Once  a  nonterminal  has 
been  recognized,  its  constituents  are  removed  from  the 
buffer  and  are  replaced  by  that  nonterminal.  Thus,  if 

<a>  : : =  begin  <x>  <y>  end 

then  the  input  configuration  might  be 

<1>  <m>  begin  <x>  <y>  end  else 

immediately  before  reduction  and  will  be 

<]>  <m>  <a>  else 

after  the  reduction. 
This  means  that  the  algorithms  can  easily  be  trapped  into  following 
wrong  paths.   It  is  remarkable  that  this  property  of  following  wrong 


*  For  example: 


<K>     ::=     <Y>   |  <Z>  J 

<Y>  : : =  begin  end  ; 

<Z>  : : =  begin  end  comma  ; 
will  be  trapped  because  the  algorithm  will  reduce  the  begin  end  pair 
to  <Y>  without  looking  at  the  comma  following. 
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paths  actually  can  be  put  to  advantage  and  increases  the  class  of  languages 
which  can  be  described  by  the  system. 

(ii)  The  empty  symbol  (\)    is  forced  into  the  buffer;  hence, 
for  the  production: 

<A>  : :  =  <E>  <  >  <C>  ; 

<B>  ::=  b  ; 

<C>  : : =  c  ; 

the  system  will  make  the  following  reductions  on  the 
input  stream  b  e_: 

1)  b   c 

2)  <B>  c 

3)  <B>  <  >  c 
k)  <E>  <  >  <C> 
5)  <A> 

This  represents  another  way  in  which  the  system  can  be  trapped. 
The  empty  symbol  can  be  avoided  altogether  by  using  ?  or  *  (as  defined 
above).  However,  certain  simplifications  of  semantics  routines  often 
result  by  putting  a  K   in  a  production. 

2.5  Semantic  Linkage 

Traditionally,  semantic  routines  have  been  subroutine  calls 
which  are  placed  at  the  end  of  productions.   The  reason  for  making  such 
a  restriction  is  that  it  is  only  at  the  end  of  a  production  that  one 
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really  knows  if  the  various  components  of  that  production  have  been  found, 
and  it  is  only  at  the  end,  therefore,  that  the  semantics  routine  can  reli- 
ably be  called.  However,  since  the  programmer  knows  full  well  that  the 
word  go  to  cannot  herald  anything  but  a  go  to  statement  and  that  anything 
else  is  an  error,  it  is  reasonable  to  permit  semantic  calls  at  any  point 
at  all  in  the  parsing  scheme;  for  example,  in 

<block>  : : =  begin  <declaration>  * 

list  <statement>  separator  #;   end  ; 

the  most  convenient  place  to  insert  the  semantic  action  which  opens  block 
storage  is  immediately  after  the  begin.  Accordingly,  semantic  calls  are 
permitted  anywhere  in  a  TBNF  production  except  in  a  list  construct.   Thus, 
the  above  becomes 


<block>  ::=  begin  @  S  3  <declaration>  * 


A  semantic  call  is  denoted  by  the  marker  @  which,  in  the 
implementation  being  described,  must  be  followed  by  S  or  T  and  an 
integer;  i.e., 

<semantic  call>  ::-     @        [  S  |  T  ]  <integer>  ; 

S   for  £emantics    -    a  pure  semantic  routine  which  does 

not  interact  with  the  syntax  analysis, 

T   for  test        -    a  semantic  routine  which  also  per- 
forms a  syntactic  function  (e.g., 
test  for  an  appropriate  declaration). 
In  order  for  the  syntax  analysis 
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to  proceed  on  the  present  branch, 
a  boolean  variable  "SEMANTICTEST " 
must  be  set  to  true.  For  example, 

<label>  ::=  <identifier>  @  T  3  ', 

where  action  number  3  tests  that 
the  identifier  has  not  been  declared 
as  an  <integer>  or  in  some  other  way 
illegally  introduced.   It  is  then 
possible  to  say, 

<statement>  ::=  [  <label>  :  ]  * 

[  <go  to  statements  | 
<for  statement> 


<assignment  statement>  ]  ; 


Ik 


3.   ON  EXECUTABLE  CODE 


3.1  Discussion 


An  interpretive  system  can  be  visualized  as  in  Figure  1, 


pseudo 
code 


data 


inter- 
preter 


Figure  1.  An  interpretive  system 

A  noninterpretive  system  can  be  viewed  in  an  exactly  isomor- 
phic way,  as  shown  in  Figure  2. 


machine 
code 


machine 


output 


data 


Figure  2.  A  noninterpretive  system 
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An  advantage  of  interpretation  is  the  use  of  more  compact  code, 
while  noninterpretive  schema  allow  faster  execution  (by  one  order  of 
indirectness).  A  table  driven  (interpretive)  system  has  the  advantage 
that  the  tables  can  be  altered  (even  perhaps  at  run  time),  but  it  is 
not  at  all  clear  that  altering  such  tables  will  be  any  less  burdensome 
than  patching  code,  especially  if  the  tables  have  been  carefully  designed 
to  optimize  run  time  speed. 

Since  neither  the  mechanism  for  changing  the  tables  nor  the 
need  to  effect  such  change  was  clear,  it  was  decided  to  bootstrap  from 
the  original  table  driven  TBKF  compiler  to  a  translator  which  outputs 
Algol  code.  The  transformation  to  machine  code  is  thus  a  two  step 
process. 


Figure  3.   Direct  translation. 


There  is  no  reason,  in  principle,  why  the  TBKF  cannot  be 
translated  directly  into  machine  code.   In  this  paper,  however,  dis- 
cussion is  restricted  to  the  translation  into  Algol  code. 
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3.2  The  Equivalent  Algol  Code 

A  boolean  procedure  is  defined  for  each  nonterminal  in  the 
grammar.   Thus,  to  the  nonterminal  <A>  there  corresponds  a  boolean  pro- 
cedure ATEST(R)  which  returns  the  value  true  if  an  <A>  is  present  begin- 
ning at  position  R  of  the  input  buffer.  Then  the  production 

<A>  ::=  <B>  <0 

is  translated  into: 

boolean  procedure  ATEST(R)  ; 
value  R  ;   integer  R  ; 
begin  integer  N  ; 
N  :=  R  ; 
ATEST   :=  false  ; 
if  BTEST(N)   then  N  :  =  N  +  1 

else  go  to  NXTALT  ; 
if  CTEST(N)   then  N  :=  N  +  1 

else  go  to  NXTALT; 
DELETE   (R,  N-R)  J 
ATEST  :=  true  ; 
INPUTBUFFER[R]   :=  AIDNO  ; 
NXTALT:   end  ; 

There  are  two  auxiliary  procedures  required  by  the  process. 
They  are  integer  procedure  FETCH(n)  which  accesses  the  n-th  element  of 
an  input  buffer.   It  calls  the  scanner  when  required  to  read  a  new 
symbol  from  the  input  stream. 


IT 


The  second  procedure,  DELETE(m,n),  is  concerned  with  moving  data 
within  the  input  buffer  by  performing  reductions  on  the  input.   It's 
function  is  to  delete  in  the  input  buffer  beginning  at  position  m  and 
collapsing  n  elements.  Hence,  after  DELETE(m,4) ,  the  configuration 

A    B    C    D    E 

t 

m 

of  the  input  buffer  becomes 

A    E 
t 

m 

Terminal  symbols,  of  course,  do  not  require  a  boolean  procedure 
to  represent  them.   For  "+»  we  have 

if  FETCH(N)   =   "  +■  "  then  N  :  =  N  +  1 

else  go  to  NXTALT; 

Given  these  primitive  operations,  it  is  easy  to  generalize  the 
process  to  obtain  Algol  code  which  is  equivalent  to  the  more  sophisti- 
cated TBNF  instructions  indicated  below. 

(i)  <&>     *  open  = 

L  :   if  ATEST(N)   then 

begin  N  :  =  N  +■  1  ;   go  to  L  end  ; 


18 


(ii)  <A>  *  closed  = 

L  :   if  ATEST(N)   then 
"begin 

delete   (N-l,l)  » 
go  to  L 
end  ; 

(iii)  <A>  1     = 

if  ATEST(N)    then  N  :=  N  +  1  J 

(iv)   list  <A>  open  = 

second  : =  false  ; 
L  :   if  ATEST(N)   then 
begin 

second  :=  true  ; 
N      :=  N  +  1  ; 
go  to  L 
end  ; 
if  not  second  then  go  to  NXTALT  ; 

(v)   list  <A>  separator  <B>  open  = 
second  :=  false  ; 
L  :   if  ATEST(N)   then 
begin 

second  :=  true  ; 
N     :=  N  +  1  ; 
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if  BTEST(N)   then 
begin 

N  :=  N  +  1  ; 
go  to  L 
end 
end  else  N  :=  N  -  1  ; 

if  not  second  then  go  to  NXTALT  ; 

(vi)   ahead  t  = 

if  FETCH (N)  f   "t"   then  go  to  NXTALT  ; 

(vii)   not  t   = 

if  FETCH(N)   =  »t»   then  go  to  NXTALT  ; 

( vi  i  i )  <any>   = 

N  :=  N  +  1  ; 

(ix)  but  t  = 

if  FETCH(N-l)   =   "t"   then  go  to  NXTALT  ; 


t 


This  final  "N  :=  N-l"  is  required  to  ensure  that  the  separator  of  a 
list  genuinely  punctuates  a  list 

C  A  B  A  B  A      is  acceptable 
t I 

CABABAB  is   truncated 

I I 
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(x)  bracket  construct. 

In  principle  this  can  be  handled  as  though  the  pseudo 
production  within  brackets  were  written  out  as  a  full 
production.   Thus, 

[  <A>  |  <B>  |  <0  D  ] 

will  be  translated  to 

if  DUMMY(N)   then  N  :=  N  +  1 

else  go  to  NXTALT  ; 

where  DUMMY  is  the  procedure  which  results  from  translating 

<DUMMY>  : :  =  <A>  |  <B>  |  <0  D  ; 

This  establishes  that  the  bracket  construct  does  have 
some  equivalent  Algol  code.   The  efficient  translation 
of  the  bracketed  construct  will  be  the  subject  of  a 
separate  section. 

(xi)   The  final- constructs  which  are  to  be  implemented  are  the 
various  forms  of  the  semantic  action.   Classically,  these 
are  the  numbered  routines: 

@  S   <integer>  ,   which  is  a  call  on  the  semantic  routine 
<integer>  ,   becomes  simply: 


SEMANTIC^  <integer>  ,  R,  N  )  ; 
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@  T  <integer>  is  the  same  as  the  above  except  that  a 
global  variable  SEMANTICTEST  must  be  set  true; 
i.e.,  in  Algol, 

SEMANTIC (  <integer>,  R,  N)  ; 

if  not  SEMANTICTEST  then  go  to  NXTALT  ; 

These  are  very  straightforward  and  do  not  differ  from  the 
usual  table  driven  methods.   However,  translation  to  Algol 
code  permits  the  use  of  another  form  of  the  semantic  rou- 
tine which  is  very  efficient  for  some  applications.   To 
implement  the  semantic  routine  "blockpointer  := 
blockpointer  +  1",  for  example,  requires  overhead  to: 

(a)  enter  a  procedure  with  at  least  one  parameter  passed; 

(b)  branch  on  the  semantic  routine  number  (something  like 
an  Algol  switch  or  Fortran  computed  go  to); 

(c)  branch  to  the  end  of  the  block; 

(d)  procedure  exit. 

Even  with  a  machine  like  the  Burroughs  B-5500  the  overhead 
is  about  three  times  as  long  as  the  kernel  code,  but  it 
may  be  completely  avoided  by  using 

<inline  semantic  routine>  's 

which  copy  the  Algol  code  directly  into  the  code  being 
produced. 
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The  syntax  for  these  routines  is: 

<inline  semantic  routine>  ::= 

11  @  "   [S  |  T  |  Q  ]  "["  <algol  part>  •*]"  J 

(note  that  @,  [,  ]  must  be  enclosed  in  quotation  marks, 
or  be  preceded,  by  a  #,  because  they  are  also  elements  of 
the  metalanguage.)  Thus, 

begin  @  Q  [blockpointer  :=  blockpointer  +■  1;  ....J 

becomes 

if  FETCH(N)  =  beginword  then  N  :=  N  +  1 

else  go  to  NXTA1T  ; 
blockpointer  :=  blockpointer  +  1  ;  .  .  . 

3.3  Optimization 

Like  most  optimizations  the  ones  described  for  this  language 
are  ad  hoc,  comprising  the  detection  and  clever  coding  of  cases  which 
appear  frequently.   The  prime  target  for  optimization  is,  of  course, 
the  bracketed  construct.   By  using  [.  .  . ]  a  programmer  is  admitting 
to  the  machine  that  the  dummy  production  implied  by  the  brackets  is 
either  very  simple  or  occurs  in  a  very  few  places  in  the  syntax.  Note, 
however,  that  to  be  logically  consistent  it  is  imperative  to  preserve 
the  equivalence  of  [.  .  .]  and  dummy  productions.  Thus, 
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<ae>  ::=  list   [list  <p>  separator  [  X  |  /  ]  @S3] 
separator  [+   -  @Sk]   ; 

is  functionally  equivalent  to 


<ae> 

<adop> 

<t> 

<mulop> 


=  list  <t>  separator  <adop>  ; 

=  +  |  -  @S^+  ; 

=  list  <p>  separator  <mulop>  @S3 

=  x  I  /  ; 


Semantic  routine  3  assumes  the  input  buffer  configuration  defined  by- 
its  local  context;  i.e.,  the  configuration  defined  within  the  square 
brackets  of  which  it  is  a  part. 

The  optimization  game  is,  of  course,  to  implement  more  efficient 
code  while  preserving  this  conceptual  neatness.   It  is  desirable  to  reduce 
the  number  of  calls  on  the  utility  routines  (FETCH,  DELETE,  etc.)  as  much 
as  possible.  All  of  the  optimizations  described  below  are  directed  at 
reducing  the  number  of  these  calls. 

(i)  The  alternatives  on  the  right  hand  side  of  a  production 
will,  in  some  cases,  occupy  only  one  place  of  the  input 
buffer  during  execution;  e.g.: 

<a>   :  :=  <b>  <d>  * 

list  <b>  | 

[  <x>  <y>   ]      | 

<     >  I 

<0    @Q[;]       ; 


2k 


Each  alternative  on  the  right  hand  side  occupies  one  place 
in  the  input  buffer,  so  the  DELETE  procedure  need  never  be 
called  in  the  recognition  of  this  production. 

(ii)  In  some  uses  of  the  closed  Kleene  star,  two  calls  on 
DELETE  are  implied;  e.g.,  in 

<a>  : : =  [  <x>  <y>  ]  *  ; 

DELETE  is  called  once  to  contract  <x>  <y>  to  [  <x>  <y>  ], 

and  again  to  close  up  the  input  buffer  as  required  by 
the  Kleene  star  construct.  The  code  generated  need  only 
perform  one  deletion,  and  it  does  not  seem  difficult  to 
automatically  emit  the  following  code  to  recognize  <a>: 

L  :   if  XTEST(N)   then  N  :=  N  +  1 

else  go  to  NXTALT  ; 
if  YTEST(N)   then  N  :=  N  +  1 

else  go  to  NXTALT  ; 
DELETE  (N-3,  3)  ; 
go  to  L  ; 
NXTALT:   .... 

(iii)   In  similar  vein,  it  is  possible  to  avoid  one  call  on  the 
procedure  delete  when  the  production  in  brackets  occurs 
at  the  end  of  an  alternative;  e.g., 

<st>  ::=  if  <b>   then  <st>  [  else  <si>   J  ?  ; 
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(iv)  Only  one  call  on  the  procedure  FETCH  need  be  made  when 

each  alternative  of  a  production  begins  with  a  distinctive 
terminal  symbol;  e.g., 

<statement>  : : =  if  .... 

go  to  .... 

for  .... 

begin  ....    j 

Then  it  is  possible  to  emit  Algol  code  like 

temp  :=  FETCH(n)  ; 

if  temp  =  ifword     then   .... 

else  go  to  NXTALT  ; 


NXTALT:   if  temp  =  gotoword   then   .... 

else  go  to  NXTALT1  ; 


KKTALT1:   if  temp  =  forword    then   .... 

else  go  to  NXTALT2; 


MXTALT2:   if  temp  =  beginword  then  ....   ; 
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Notes: 

(a)  It  is  worth  pointing  out  that  for  a  small  list  the  above 
sequential  search  is  the  fastest  code  to  find  the  approp- 
riate "branch"  of  the  tree.  It  is  clearly  faster  than  a 
table  lookup  "because  it  involves  no  subscripted  references, 

(b)  All  of  the  above  optimizations  can  be  implemented  in  a 
one  pass  compilation.   Some  of  these  are  already  present 
in  the  current  B-5500  implementation.   Other  more  global 
optimizations  clearly  require  more  than  one  pass. 


27 
k.      THEORETICAL  RESULTS 

k.l     Definitions 

(i)   Translatable  Backus  Naur  Form  (TBNF)  is  BNF  extended 
in  the  way  described  in,  and  interpreted  in  the  manner 
of,  Chapter  2. 

(ii)  A  recursive  descent  (RD)  machine  is  an  automaton  which 

executes  TBNF  in  the  manner  of  Chapter  3.   We  also  require 
that  the  stack  of  the  machine  be  a  linear  function  of  the 
length  of  the  input  string;  i.e.,  only  a  finite  number  of 
nonterminals  can  be  stacked  up. 

k.2     Discussion 

The  author  claims  that  the  restrictions  imposed  on  BNF  do  not, 
in  fact,  restrict  its  capacity  to  describe  languages.   It  would  be  expec- 
ted that  TBNF,  being  equivalent  to  a  subset  of  BNF,  has  a  generative 
capacity  somewhat  less  than  BNF.   However,  because  of  the  way  in  which 
the  syntax  is  interpreted,  the  RD  machine  is  actually  as  powerful  as 
a  LBA.   Thus,  the. class  of  languages  describable  in  TBNF  is  not  merely 
the  class  of  context  free  languages,  but  the  class  of  context  sensitive 
languages.   This  rather  remarkable  result  is  a  direct  consequence  of  the 
algorithmic  interpretation  imposed  on  TBNF;  the  algorithm  happens  to  be 
sufficiently  general  to  be  used  as  a  LBA,  albeit  with  a  very  devious 
instruction  set. 
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k.3     Theorem  1:  A  Lower  Bound  For  the  Power  of  a  RD  Machine 
A  RD  machine  is  as  powerful  as  a  LBA. 

U.3.1  Discussion  of  the  Theorem 

The  method  of  proof  is  constructive.   Given  a  set  of  LBA 
instructions,  it  is  possible  to  build  a  grammar  which  initiates  those 
instructions.  Much  use  will  be  made  of  several  specialized  features 
of  the  RD  machine  (RD  grammar).  Particular  note  should  be  taken  of  the 
following  peculiarities. 

(i)  <A>  ::=  <> 

effectively  forces  the  nonterminal  <A>  into  the  input 
stream.  Thus,  it  is  possible  to  write  on  the  input 
string.  Likewise, 

<A>  ::=  <B>  ; 
is  a  production  which  will  cause  the  machine  to  overwrite 
<B>  with  <A>. 

(ii)   ahead,  back,  not,  but  do  not  cause  any  reductions  to  be 

made.  They  are,  so  to  speak,  read  only  productions;  thus, 

<a>  : :  =  ahead  <b>  <c>  ; 
<c>  ::=  <b>  ; 

is  a  method  of  writing  <a>  only  when  a  <b>  is  present 
in  the  input  stream,  ahead  <b>  merely  checks  for  a 
<b>,  then  the  <c>  forces  a  change  in  naming. 
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(iii)   It  has  been  pointed  out  before  that  the  system  will  not 
unpick  a  reduction  once  it  has  been  made.  This  means 
that  an  erroneous  path  taken  in  parsing  can  nevertheless 
make  reductions  and,  hence,  alter  the  input  stream.   Thus, 

<E>  : :  =  <A>  <never> 

may  well  be  a  device  for  writing  <A>  on  the  input,  where 
<never>  is  a  special  nonterminal 

<never>  :  :  =  Tj  ; 

with  T]  ^  V  ,  the  set  of  terminal  symbols.  The  <never> 
causes  the  RD  machine  to  abandon  any  attempt  to  reduce 
the  current  alternative. 

4.3.2  The  LEA 

As  a  canonical  example  of  a  LBA,  we  take  a  machine  M  with  a 
finite  number  of  states  Q. ,  ...,  Q  and  a  read/write  head  which  can 
advance,  reverse,  read,  or  write  on  an  input  tape.   The  behavior  of 
the  machine  can  be  described  by  a  set  of  rules  of  three  types: 

(i)   R  :  (Q.,  I£)   -  (Qj,  Ik) 

In  state  Q.  read  I„,  go  to  Q.,  and  write  I.  ; 
i      £'  j  k' 

(ii)   R  :  (Q.,  Ig)  -  (Qd,  +■) 

In  state  Q. ,  read  I-,  go  to  Q.,  and  advance  the  read  head. 
1        Ju  3 
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(iii)  R  :  (Q.,  I£)   ->  (Q..,  -) 
reverse  the  read  head. 

The  LEA  is  in  state  0^  when  switched  on,  and  the  read  head  is 
poised  above  the  first  input  character.  It  finishes  by  going  into  a 
state  0_,. 

For  each  state  of  the  machine  Q. ,  introduce  nonterminals 
<q.  >  ,  <q.^>  ,   <q.  >  ,   <<!.>  and  for  each  terminal  symbol  t.  introduce 
a  nonterminal  representative  <t.>. 


k . 3 . 3  Proof  of  the  Theorem 


(i)  First,  the  entire  input  stream  must  be  entered  into  the 
input  buffer  and  be  punctuated  with  spaces  which  will  be 
used  later  to  record  the  state  of  the  machine. 

<initial  symbol>  : : =  <edit  input>  | 

<apply  rules>  ; 

<edit  input>      : : = 

<qn>  list  [  <t,>  |  <t0>  |  ....  |  <t  >  ] 
^1   — •—     1   '    2   '         '    m 

separator   [  "^q-i^  ]  open  <never>  ; 


(ii)  For  each  rule  R  :  (Q. ,  t)  -»  (Q.,  s)  of  the  LBA  write 

X       1  J  ' 


<R  >  : : =  ahead  <q.> 


[  <q.^>  ahead  <t>  [  <s>  <never>  ]  ? 
J 

<q.>  ]  <never>  ; 
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The  third  line  of  the  production  restores  <q.>  if  the 

input  stream  fails  to  have  <t>.   Note  that  productions 

will  be  introduced  later  to  allow  any  state  <q.>  to  be 

reduced  to  <q.>. 
J 

(iii)  For  LBA  instructions  of  the  form 


RX  :     (V  t)  "*  (qy   +) 


write 


<R  >  ::=  ahead  <q.> 


[  <q.4>     ahead     <t> 


<q.>         <never> 


The  mechanism  is  similar  to  (ii). 
(iv)  For  LBA  instructions  of  the  form 


Rx  :   (q±,  t)  -  (qy    -) 


write 

<R  >  ::=  ahead  <q.> 


x 


[  <q.">  ahead  <t>  I 

<q.>  ]  <never>   ; 
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(v)  Each  of  the  productions  above  is  designed  to  execute  one 
instruction  of  the  machine.  Now  a  production  is  defined 
which  administers  the  above  three  types  of  productions 
and  causes  the  machine  to  advance  or  reverse  on  the 
input  stream. 


<apply  rules>  ::=  <!*-.>  |  <R?> 


<R  > 
m 


ahead  <q.  >  [  <B>  <any>  <q.>  <never> 


<B>  <any>  <apply  rules>  ]   (i=l,...n) 


°> 


ahead  <q.^>  <q.>  <never> 
1     1 


ahead  <q.>  <apply  rules>  | 
ahead  <B> 


^ 


S(i=l,...,n)   ....  (E^) 


[[  <q.>  <any>  ahead  <q.  ~>  | 

]  <never> 
<apply  rules>  ]  | 

ahead  ^^    ;  ahead  <qf  >  |  ahead  <q„ "^>  |  ahead  <qf ^>  ; 


J 


.  (El) 


.  (E2) 


(i=l, ...,n)   ....  (E3) 


.  (E5) 


Notes  on  these  rules: 

El:   These  rules  apply  each  of  the  instructions  in  the  repetoire 
of  the  machine; 

E2:  If  the  machine  is  to  advance  on  the  input  string  (  <q.  > 
present),  then  mark  the  input  with  the  present  state  and 
apply  the  rules  again; 


33 


E3:  Apply  the  rules  again; 

E4:  A  reversal  is  produced  by  backing  up  the  right  recursive 
production  for  <apply  rules>,  where  a  back  up  is  marked 
by  <B>  and  the  productions  scan  forward  to  retrieve  the 
previous  state  of  the  machine; 

E5:   The  final  acceptance  state, 
(vi)  Lastly,  the  productions  which  allow  state  exchange: 


<q.  >>  ::=  ahead  <q  ^>  <qn^> 


"N 


ahead  <(^>  <(l^>      I 


\    x  =  +,  -,  0 


ahead  <q  ^>  <(L*> 


J 


<B>   |  <   >  ; 
for  i=l, ...,n. 

<B>  also  functions  as  a  state,  so 

<B>  ::=  ahead  <q.X>  <q.X>  ;  Vi,  * 

1     i 

Finally,  the  terminal  representative: 


<t.> 

l 


=  t .   ;    i=l> • • • >m« 


3^ 


The  rules  for  transforming  a  LBA  instruction  set  to  an 
equivalent  RD  instruction  set  are  thus  established,  which 
completes  the  proof. 

^.3.*+  The  Converse  of  Theorem  1 

By  storing  the  possible  parses  on  the  input  tape,  it  is  trivially- 
possible  to  imitate  a  RD  machine  on  a  LBA. 

k.k     Theorem  2:   Ambiguity 

There  exists  no  effective  procedure  for  deciding  whether  or  not 
any  given  context  free  grammar  G  is  ambiguous  [8], 

Proof: 

Suppose  that  there  exists  an  algorithm  A  which,  when  presented 
with  any  grammar  G,  can  decide  whether  or  not  G  is  ambiguous.  Then  select 
arbitrary  pairs  of  strings  (f . ,  g. )  (i=l,...,n)  of  elements  from  some  set 
V  =  {a,  b,  c,    ...,  z)   and  define  G  as 

<initial  symbol>  : :  =  <X>  |  <Y> 

<X>  ::=  fx  <X>  gx   | 

f2  <*>  g2   I 


f   <x>  g   |  i 

n       Bn   '  r 
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<Y>  : :  =  a  <Y>  a 

b  <Y>    b   | 


z  <Y>  z   |   / 

where  g.  is  the  reverse  of  the  string  g.  and  ^  is  a  center  marker. 

Then  ambiguity  in  this  grammar  corresponds  to  the  existence  of  some 
strings  of  elements  from  V  which  are  terminal  derivatives  of  <X>  or 
<Y>.   Therefore,  the  algorithm  A  must  be  able  to  decide  whether  there 
exists  a  string  S: 


S 


=  f.      f .        .    .    .     f .   i  g. 


S\  /\ 


•      •      •         Bi  &± 

■1       2  n         n  2         1 


—  a..      a_  •••      aca  •••      a0       a, 

12  p  r      p  2  1 

=  h  f.  h 

That  is,  if  there  exists  a  string  h: 

h  =  f.   f.  .  .  .   f .   =  g.   g.    .  .  .   g. 
1.2  n       12  n 


which  is  exactly  Post's  correspondence  problem,  which,  in  turn,  is 
equivalent  to  the  halting  problem  [k].     Hence,  the  algorithm  A  does 
not  exist. 
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U.5  Some  Results  on  Timing 

It  is  difficult  to  obtain  meaningful  theoretical  comparisons 
between  various  parsing  methods.   In  practice,  experts  have  discovered 
that  no  system  has  a  clear  cut  advantage  over  all  of  its  competitors  [l], 
Each  system,  in  effect,  takes  advantage  of  a  certain  corner  of  the  very 
large  data  base  consisting  of  language  specifications  and  the  program  in 
hand.  Top  down  methods  tend  to  rely  on  the  intrinsic  properties  of  the 
language  concerned.  Bottom  up  methods  rely  more  on  the  program  being 
translated.   There  is  no  evidence  that  a  system  which  combines  the  two 
approaches  will  necessarily  run  faster  in  practice  than  either  of  the 
two  component  parts.   Part  of  the  reason  for  this  is  that  for  a  large 
subset  of  precisely  defined  languages  (e.g.,  programming  languages)  the 
two  algorithms  follow  almost  identical  paths,  in  some  cases.   Consider, 
for  example,  the  following  production: 

<statement>  : : =  if  <boolean>   ....   | 

for  <variable>  ....   | 
begin  list  <statement>  .... 
go  to  <designational>   | 
list  [  <variable>  :=  ]   ....   ; 

Typical  top  to  bottom,  bottom  to  top  techniques  will  do  a  sequential 
search  on  the  key  words  (if,  for,  begin,  etc.)  to  decide  which  branch 
of  the  syntax  tree  to  take.  The  subsequent  stacking  operations  are 
likely  to  be  similar  in  both  methods.  For  such  a  class  of  structures, 
it  is  largely  irrelevant  which  method  is  being  used. 
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There  are  cases,  however,  for  which  top  down  methods  can  be 

distinctly  inferior.  These  are  typically  cases  in  which  there  exists 

a  great  deal  of  hierarchy  or,  in  other  terms,  where  there  is  a  very 

definite  precedence  relation  between  a  large  number  of  operations. 

Consider,  for  example,  a  precedence  structure  with  n  operators  P  ,  ..., 

P  with  the  precedence  relations  P,  <P_....<P.  An  expression 
n  12  n       ^ 

is  of  the  form 

N  P  N  P  N  P  .... 

where  the  N's  are  operands  and  the  P's  are  operators.   The  grammar  for 
such  expressions  is 


<E 


>  ::=  <E  ,>   [P  <  E  ,>  ]  *   : 

n         n-1     n    n-1 


<E1>  : : =   N      [P1   N     ]  *  \ 

Using  the  top  down  method,  a  stacking  operation  will  be  required  for 
each  change  in  level  of  precedence.   Thus, 


N  P   N  P^   ... 


will  involve  the  system  in  three  stacking  operations  as  it  winds  up  and 

down  the  syntax  tree.  Assuming  the  operators  are  randomly  distributed, 

2 
we  have  the  following  result:   in  n  of  the  n  possible  juxtapositions  of 

operators,  there  will  be  0  stacking  operations;  and  in  2(n-i)  cases, 

there  will  be  i  stacking  operations  (i=l,...,n). 
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Forming  the  average  number  of  stacking  operations/operator  we  get: 


In     1      In 
t  ~  3      3n  -  3 

By  contrast,  a  precedence  method  can  get  away  with  approximately  -^   stack 


operation  per  operator 


2        i     i 
Ep  -      2   "  2     2n 


Parentheses  can  be  incorporated  into  either  system  with  equal  efficiency. 
Note  that  precedence  techniques  assign  precedence  levels  to  "("  and  ")". 
This  is  not  done  here  and  the  top  down  methods  handle  such  problems  by 
recursion,  a  task  involving  1  stacking  operation. 


39 


5 .   CONCLUSION 

The  system  described  above  has  proved  to  be  a  satisfactory 
tool  for  syntax  directing  the  first  pass  of  a  compiler.   It  could  be 
trivially  modified  to  be  used  in  later  passes  if  that  were  desired. 
It  also  has  application  in  certain  translation  processes  (e.g.,  trans- 
lating from  a  matrix  language  into  PL/l) .  However,  the  extensive  use 
of  [  <any>  but  .  .  . ]  *  constructs  suggests  that  a  scan  operation  should 
be  included  as  a  legitimate  element  of  the  syntax.   This  is  one  of  sev- 
eral additions  which  have  been  examined  for  TBNF  which,  although  not 
changing  the  recognizing  ability,  make  for  faster  and  neater  code  (in 
the  spirit  by  which  TBNF  arose  from  BNF). 

The  decision  to  translate  TBNF  into  Algol  code  is  a  significant 
departure  from  a  table  driven  scheme.   Exponents  of  table  driven  methods 
will  undoubtedly  see  this  as  a  retrograde  step.   However,  the  advantages 
to  be  gained  by  table  driving  are  somewhat  undermined  by  a  comparatively 
clear  algorithmic  language.  For  example,  there  is  no  point  in  inter- 
preting a  table  when  it  is  just  as  easy  to  recompile  from  the  source 
code.   The  obvious  next  step  is  to  translate  directly  from  TBNF  to 
machine  code,  a  step  which  could  improve  speed  by  another  factor  of  3. 


APPENDIX  A 
RECURSIVE  DESCENT,  TOP  TO  BOTTOM  ANALYSIS 

Number  the  components  of  the  right  hand  side  of  a  produc- 
tion as  follows. 

I1     I2  *3 

<a>    ::=    <b>    <c>    <d>  | 

<e>    <f>     ; 

t      t 
1      2 

Then  define  a  (recursive)  procedure  "presence"  which  maps  the 
cartesian  product  of  VAT  and  I,  where  V  =  set  of  all  nonterminals 
in  the  grammar 

I  =  {natural  numbers)  , 

onto  the  set  {true,  false)  ;  i.e., 

presence:  V„  X  I  -»  {true,  false) 

If  there  is  a  production  such  as  the  one  above,  then  the  following 
relation  will  exist  between  certain  pairs  in  V^  X  I;  viz.,  from  above, 

presence  (  <a>,  n  )   =  presence  (  <b>,  n  )  ^ 

presence  (  <c>,  n  +  1  )  ^ 

presence  (  <d>,  n  +  2  ) 

\z    presence  (  <e>,  n  )  ^ 

presence  (  <f>,  n  +  1  ) 


Ul 


(Note  that  "  =  "  is  used  here  in  the  usual  assignment  sense  and  not  in 
the  mathematical  sense;  i.e.,  the  left  hand  side  is  found  by  computing 
the  right  hand  side  in  a  left  to  right  fashion) 
For  a  terminal  symbol  t  define 

presence  (t,n)   =   (FETCH(n)  =  t) 

where  FETCH(n)  is  the  n-th  symbol  of  the  input  string. 

The  recursive  descent  (RD)  method  proceeds  by  repeatedly 
rewriting  a  given  presence  function  with  its  equivalent  right  hand  side. 
The  recursive  process  stops  when  a  presence  function  finally  degenerates 
into  a  terminal  test. 

In  the  method  used  in  this  paper,  it  should  be  observed  that 
once  a  presence  function  has  been  completely  evaluated  (i.e.,  success- 
fully recognized)  part  of  the  input  stream  is  rewritten  with  the  non- 
terminal which  has  been  recognized.   This  has  rather  important  consequences 
as  explained  in  Section  2.3. 
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APPENDIX  B 
THE  SYNTAX  SPECIFICATION  OF  TBNF 

This  section  rigorously  defines  the  language  TBNF  and 
contemporaneously  defines  the  algorithm  for  accepting  strings  in  this 
language. 

The  semantic  routines  are  also  sketched. 


<syntax  specif ication>  ::= 


<nonterminal> 
<terminal> 


list  <production>  end  j 
::=  »<«   <identifier>  *  but  <any>  ">"  ; 

<any>      but  ";"  but   "| "  but  "[  ••  but  »<« 

but  "]"  but  "#"  but  ""*■  but  "  >  " 

but  #  list       but  #  separator 
but  #  ahead      but  #  back 
but  #  but        but  #  open 
but  #  close       but  #  not 
but  "*"  but  ''©n  but   "?" 

"#"    <any>   | 

"«n  [  <any>  but  """  ]  *  """  ; 
<simple  basic  symbo2>   ::= 

<terminal>   | 

"["  <right  hand  side>  "]"  | 

"["  <error>   | 

<nonterminal>  ; 


^3 


<basic  symbo2> 


#  list  <simple  basic  symboP" 

[  #  separator  <simple  "basic  symbol>  ]  ? 
[  <action  call>  @SUU 
#  open        @S4U 


#  clo 


se 


@SU5  | 
@S^5  ] 


<   > 

#  list   <error> 

#  ahead  <terminal> 

#  ahead  <error> 

#  "but    <terminal> 

#  but    <error> 

#  not         <terminal> 
<action  call> 
<simple  basic   symbol> 

[ "  ?    " 
"  *   "   [  <action  call>     @SM+ 
#  open  .@S4i+ 


#  close 
<     > 

<     >   ]      I 

'<"  #  an^r      ">" 


@Sk6    | 
@Sh6    ] 


"<tt 


II>H 


hk 


comment 

Note  that  the  convention  of  open  and  closed  Kleene  stars  is 
repeated  here  because  the  semantic  actions  are  different; 

<action  call>  : : = 

"  @  "   [  S  |  T  ]  <integer>   | 

"  @  »   [  S  J  T  |  Q  ]   "["  <algol  part>  "]" 

"  @  "  <error>  ; 

<algol  part>  ::=   [  <any>  but   "]"  but   "[" 

I  "["  <algol  part>  »']'»]*   ; 

c  omment 

Note  that  the  recursive  method  of  matching  [  with  ]  is  not 
the  most  efficient  way  of  doing  it; 

<production>  : :  = 

<nonterminal>  ::=  <right  hand  side>  '*; "  ; 
<right  hand  side>       ::= 

list  <alternative>  separator   "|"  close  ; 
<alternative>  :  :  = 

list  <basic  symbol>  ; 
<error>  : : =  [  <any>  but  " | " 

but  »;  »  ]  *  @S3  ; 
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APPENDIX  C 

THE  TBNF  GRAMMAR  FOR  THE  LANGUAGE  DEMALGOL, 
AND  THE  CORRESPONDING  ALGOL  CODE  PARSER 


As  a  further  example  of  the  syntax  of  a  language  written  in 
TBNF,  we  take  a  very  simple  version  of  Algol  60  which  is  used  "by  the 
proponents  of  Translator  Writing  Systems  to  compare  various  systems 
under  development  at  the  University  of  Illinois.  Notice  the  error 
recovery  built  into  the  syntax  and  the  use  of  semantic  tests  to  deter- 
mine the  exact  path  to  be  taken  under  certain  circumstances.   It  is 
possible  to  avoid  the  use  of  semantic  tests  if  desired,  however,  this 
example  is  designed  to  adhere  closely  to  operational  conditions  in 
which  this  information  is  available  and  provides  a  neat  way  of  parsing 
the  source  string. 

The  grammar  presented  below  parses  at  1200  cards  per  minute 
on  the  B-5500  (not  including  scanner  time). 
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COMMENT  THE  FOLLOWING  LANGUAGE  TS  USED  AS  A  BENCH  MARK  RY  TWS 
DESIGNERS  AT  ThF  UNIVERSITY  OF  ILLINOIS.  NO  ATTEMPT  HA* 
RFPN  MADE  TO  REOUCE  THr  NUMRER  OF  NONTERMINALS  OR  PRODUCTIONS 
UNOERSTnOO  NOMFMCLATURE,  THE  LANGUAGE  REQUIRES  THREE 

PRODUCTIONS  FOR  ITS  DEFINITION  I 
OFMALGO 
<PRnGRAM>  Ma   <BLOCK>  i 

<BI  OCK>  I  I* 

BEGIN 

<OECLARATlON>* 

LIST  <STATFMENT>  SEPARATOR  fj 

END  I 

<nFr|_ARATTON>  tt« 

[  INTEGER  •0(TyPEI«TNTFGFRTyPEI J/ 

BOOLEAN  fQCTYPEl-BOOLEANTYPEl  ]/ 

LABEL  •0[TVPE»«LABELTYPF   I]] 

[LIST  [<*!»  RS[ENTFR(Pl,TYpE)in 

SEPARATOR  .    / 

<  Error  >       i   j 

<STATEMrNT>  •  »■ 

f  <LAB^L>   I   1  * 

f   t  GO  TO   /   GOTO   1  <LABEL>   / 
IF  <BOOLEAN>  THFN  <STATEMENT> 

t  ELSE  <STATEMENT>  ]  ?  / 

<VARIABLE>  t  •»  /  *  1   <VALUF>  / 

<   >  [AHEAD  fl/AHEAD  END/AHEAD  ELSEJ  / 


hi 


<ERROR>    1    I 


<LA«FL> 


<VAI.  UE> 


I  »■ 


t  l> 


<*!>  PTtSEMANTICTESTl-TYPFOFfPl) 

LABELTYPF  J]  J 

i_Ist  <arTthmEttc   PRTMARY> 

SEPARATOR  f  ♦  /  -  /  x  /  #/  1  I 


<Rnr>LEAW> 


<R0nLEAM  PRTMARY> 


<ARTTHMFTIC  PRIMARY> 


I  l  » 


I  i  ■ 


LIST    <ROOLEAm  PRIMARY> 
SEPARATOR  I     ANn  /  OR  ]  I 


<*I>  #TrSEMANTICTFSTi«TYPEOrf PI  )  ■ 
BOOLEANTvpF  |]/ 

(   <B00I.EAN>  )  / 
<VALUE> 

f  ■  /  4    /   #</#>/   #<   /  tl       1 
<VALUE>  I 


t  I! 


(   <VALl)E>   ) 
<VARIABLE>   I 


<VAPIABI  F> 


I  I 


<*I>  PTtSEMANTlCTESTl«TYPFOF(Pl) 
INTE6FRTYPE  J]  I 


1+8 


<EPROR>  I  la 

<ANY>   r  <ANY>  *UT  #1  BUT  ttUQ    1  ♦  I 


fno 
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x********************************************** 
g********************************************** 

BOOLEAN  PROCEDURE  PROGRAMTE«TC  RlO  5  I 
VA»  IJF  RIO  |  INTEGER  RlOj 

BEGTN  LABEL  FIN.RENAMF  I 
INTEGER  NIOI 
LAREL  L10.L! 1 » 
LABEL  ACCFPTinj 
LABEL  NXTALT10»NXT4LT1 II 


NXTALTl^iNlO  la  RIO  | 

LlOtTF  BLOCKTESTC  N10  )  THEM 

BECIN  N10  i«  N10  ♦  1  I 

ENn  ELSE  Gn  TO  NXTALT11  I 
GO  TO  ACCFPTIO  | 

ACCrPT10|DELETEfRlO,NlO  •  R10)  If 
RFNAME   IRSTR101  l«      42  It 
FlNt  PRHGRAMTEST  |«  TRUF  IS 

WXTAI  T11 tFNDt* 
K********* *********************** ************** 

BOOI FAN  PROCEDURE  BLOCkTEST(  RIO  )  I 
VAI UF  RIO  I  INTEGER  RIOI 

BEGTN  L4BFL  FIN#RENAME  I 
I^TEGTR  NIOI 

BOOLEAN  SFCON011I 

LABEL  Ll0#Lll'Lt2'Ll3»Llft»Ll5#Ll*l 
LABEL  ACCEPTlOi 
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LABEL  NXTALT10,NXTALT1U 


IF  FFTCMfRlO)  ■      05  THEN  GO  TO  FIN  I 


NxTaLTinfNio  i»  RlO  i 

LlOiIF  FFTCH?  N10  )  a  WORDBFGIN   THEN 

BFCIN  NlO  la  N10  ♦  1) 
ENn  FLSE  GO  TO  NXTALT11  I 
L 1 1 • I F  HFCLARATT0NTF5T(  NlO  )  THEN 
BF^TN  N10  is  NlO  ♦  1  I 

OF! FTErNlO-2»2)  I  NlOtaNlO  -  1  I  GO  TO  LH  END  | 
Ll?lTF  <TATFMENTTEST(  N10  )  THEN 

BE«IN  NlO  l«  NtO  ♦  1  I 

IF  SFC0ND11  THFN  BEGIN  DELETE ( NlO-3 » 3 ) I Nl 0 i »Nl 0-2  END  I 

SE^ONDll     !■    TRUE    J 
L13HF    FETCH(Nin)    a         46    THFN 

BFGIN  NlO  I-  N10  ♦  II 
GO  TO  L12  END  ENO   ELSE  NlO  !■  NIO  -  1  I 
IF  WOT  SFCONDll  THFN  GO  TO  NXTALT11  I 

SECONDH  !■  F*L^E  I* 

Ll5iIF  FFTCHf  NIO  )  a  WORDEwD     THFN 

BEMN  NIO  li  NIO  ♦  II 

ENO  ELSF  GO  TO  N*TALT11  I 
GO  TO  ACTEPTIO  I 

ACCFPTlO|OELFTEfRlO,NlO  -  R10)  |* 
RENAME   IRSTRIOI  fa      45  II 
FlNl  BLDCKTFST  la  TRUE  It 
NXTAI TlllENDl* 
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BOOl EAN  PROCEDURE  DECLARATIONTESTt  RlO  )  I 

VALUE  RIO  I  INTEGER  RIOJ 
BEGIN  LABEL  EIN, RENAME  I 

iMTEGtR  N10»N!1#N1?I 

IwTEGrR  R11»R12) 

BOOLEAN  SECONnil) 

BOOLEAN  BC11#RC1?I 

LABEL  L10»lU»L1?»L13»L1*»L15#l1*'L1^'L1«'L19»L20h21I 

LABEL  L?2#L23l 

LABEL  ACCEPT10#ACCEPT1 1>ACCEPT12#ACCEPT1 3| 

LABEL  NXTALT10,NXTALTU»NXTALTl2»NXTALTt3»N*TALTl4,NXTALTH) 
LABEL  NXTALTlA,NXTALTir»NyTALTt8#NXTALTl9#NXTAlT?0| 

IE  rrTCM(RlO)  •      49  THEN  GO  TO  TIN  I 

NXTALTiniNlO  •■  RlO  I 

LtOiBEGTN  BCH  la  EAISE  J  Rll  la  NlO  J 

NXTALTl?lNll  !■  Rll  I 

LlltlF  FrTCH(  Nil  )  »  WOROINTEGER       THEN 

BEGIN  Nil  !■  N1  1  ♦  II 

ENn  ELSE  GO  TO  NXTALT13  I 


TYPEl.INTEGERTYPEJ 


GO  TO  ATCEPTU  t 

NXTAlTIIiNII  |a  Rll  | 

L12HE  FETCH?  Nl  1  )  ■  WOROBOOLEAN       THEN 

BECIN  Nil  la  Nil  ♦  II 

ENn  ELSE  GO  TO  NXTALT1A  I 


60    TO    ArcFPTll     I 
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TYPFl.BOOLEANTYPEI 


NXTAI.  Tl«  |N1  1  l»  «11  I 

LI 3l IF  rrTCHC  N11  )  ■  WOROLABEL   THEN 
BFMN  N11  la  Nil  ♦  II 

ENn  FLSF  GO  TO  NXTALT15  I 

GO  TO  ArcFPTll  I 

ACCTPT11 iDELFTEfRtl#Nll«Rll5  I 

BC11 la  TRUE  I 

NXTAlTHlFND  I 

IF  «C11  THEN  REGIN  NlO  |a  NlO  ♦  1  I 
ENn  ELSF  GO  TO  N*TALT11  I 

Ll^iBEGTN  BC11  la  FALSE  I  R1 1  la  NlO  I 
NXT8LT1MN11  «•  Rll  I 

l16iregtn  bci?  t«  false  i  ri2  ia  nii  i 


typei.laBeltypf  I 


NyTALTlAfNl?  |b  Rl2  1 

L17HF  FFTCH(N1?)  «     *  THEN 

BEGTN  N12  la  Nl2  ♦  II 
ENn  FLSF  Gn  TO  YXTALT19  I 
FIRSTI.P12I  L*ST|«N12   -   R12  ff 

GO  TO  ATCFPT13  I 

ACCrpT11lDELFTEfRl2»Ml2-Rl2)  I 
«Cl2l»  TRUE  I 

NYTALTl'lFND  I 


FNTER(P1»TYPE)I 
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IF  PCl?  THEN  BEGIN  Nit  t>  Nil  ♦  1  J 

IF  SFCONOH  THFN  BFGlN  DELETF(NlO«3» 3 ) I Nl 0 l«Nl 0-2  END  I 

SFCONOll  !■  TRUE  I 
L19HF  rrTCH(NH)  *    5A  THFN 

BF^IN  Nil  !■  Nil  ♦  II 

on  to  ma  end  end  fise  nii  ib  nii  •  1  i 

IF  MOT  <rcONOtl  THEN  GO  TO  NXTALT1?  J 
SECnNDll  !■  FALSE  I* 
GO  TO  Af*CFPTl2  » 


NXTALTlTiNll  »«  R 1 1  I 

L?1HF  TRRORTEST(  Nil  )  THEM 

BEftlN  Nil  IB  Nil  ♦  I  J 
ENH  FLSF  GO  TO  NXTALT20  i 

GO    TO    AfCFPTl?     I 
ArCrPT19|OELFTEfRll,^11-Rin    I 
BClll«    TRUF    t 

N^TALT?0|FNO  I 

IF  OC11  THEN  BEGIN  NlO  IB  N10  ♦  1  I 

ENn  FLSF  GH  TO  *XTALT11  J 

GO  TO  ACCEPT10  I 

ACCrPTlOlOELFTEf  RlO,ig10  •  R10)  I* 

RENAME   iRSTRlOl  !■      49  if 

FlNi  DFCLARATIONTEST  Ib  TRUF  IS 

NXTAI  TlllFNOl* 

ROOLEAN  PROCEDURE  ST ATEMENTTESTt  RIO  )  I 

VA!  UF  RIO  |  INTEGER  R10J 
BEGIN  LABEL  FIN»RENAME  I 


5^ 


IwTEGrR    NlO,NU,Nl2j 

WEGFR    Rl1,R12» 

RnnLFAN  BCi | ,pC 12| 

LAREL  L10,LU»L12,L13,LU»L15#L16,L17,L18»L19,L20,|  21) 

LftRFL  L?2.L23,L2«,L25,L26,L27,L28,L29,L30»L31,L32,L33| 

L*REL    L34»L35»L36»L37'L38#L39#l40J 

LAREL     AC CFPT 10, ACCEPT  1  1, ACCEPT 12,  ACCEPT  13,  ACCFPTU,ACCFPT  15, ACCEPT  16J 

LABEL    NVTALT10,NXTALT1 1 » NXT ALTl 2 » NXTM  Tl 3, NXT ALT 1« »NXT ALT1  9  I 

LABEL    N*TAl  T1A,N*TALT17,NVTALT1R#NXTAI  Tl  9,  N*TA  LT20  ,  NXTAt.  T21  ) 

LAREL  NXTALT22#NXTALT2  3,NyTALT?«'NXTALT25,NXTALT26,NXTALT?T| 

LABEL  nXTAlT2A,nXtALT?9,N*TAlT30,nXtAIT31| 


TE  TETCMfRlO) 


5?  THEN  GO  TO  FIN  J 


NXTALTlfllNlO  la  RIO  J 

LlOiREGTN  BC11  »■  FALSE  I    R1 1  «■  NlO  I 


NxTALTl?tNll  |a  Rll  | 

Lll  HE  I  ARELTEST(  N1  1  )  THEM 
BEGTN  Nil  is  Nil  4  1  J 
ENn  ELSE  G"  TO  NXTALT13  I 

L12IIF  FFTCHfNH)  «    13  THFN 
BEGIN  N11  l«  Nil  ♦  II 
ENn  ELSF  GO  TO  N*TALTl3  * 

GO  TO  ArcEPTU  I 

ACCFPTH  mELFTEfRll,Nll-Rin    ) 
PC11 la    TRUE    I 

N*TALTMtFNO    » 

IE    RC11    THEN    BEGIN    NlO    |a    NlO    ♦    1     I 
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OFI  ETEfMlO-2,2)     I    NllOi.NlO    -    1    |    Qn    TO    L1*    ENO    | 
LM«RE6TN    BC 1  1     »*    FALSE     t     R11     »■    NlO    ) 

NXTALTlfllNll  l«  R 1  1  I 

Ll5tBFGTN  BC1?  t»  FALSE  I  R12  Ib  Nil  I 


N*TALTlAlNl?  »«  R12  » 

L1 ^  t  Tf  rFjCHf  N*2  )  ■  WORDGO 

RFr,  TM  Ml?  »«  Nl2  ♦  1» 

FNn  F"l5F  Gn  T0  ^XTALTlT  | 

L17HF  TFTCHf  N12  )  •  WOROTn 
B  F  ft  T  N  Ml?  !■  N  1  2  ♦  ll 
ENn  FLSF  SO  TO  NXTALT17  i 

GD  TO  A<*CFPT13  » 


THFN 


THFN 


N^TALTlTiNl?  !■  R12  I 

L18HF  rrTCHf  N12  )  b  WORDGPTO    THFN 

BEGIN  Ml  2  la  Ml  2  ♦  If 
FNn  FLSF  GO  TO  NXTALTlA  I 

GO  TO  ACCFPT13  I 

ACCrPTHinFLFTErR12,Ml2-Rl2)  I 

«C12t»  TRUE  I 

NXTALTl*lFNn  > 

IF  OT12  THEM  BEGIM  Nil  Ib  N11  ♦  1  I 

ENn  FLSF  GO  TO  M*T*LTH  * 
L?OiTF  I  ARELTEST(  Nil  )  THEM 

BEGIN  Nil  ib  N11  ♦  1  I 

ENn  FLSF  Gn  TO  MVTALT1*.  I 
GO  TO  ACCFPT12  I 
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NXTALTHlNll     I"    Rll     I 

L2iur  rnCHf  nii   j  »  wordif  then 

BEGIN  N11  la  Nil  ♦  II 
ENn  ELSF  GO  TO  N*TALTl9  I 

L??lIF  »»OOLEANTFST(  Nil  )  THEN 
BEGIN  Nil  i.  Nil  ♦  1  | 
ENn  ELSE  GO  TO  NXTALT19  ) 

L?3iIF  FFTCHC  N11  )  •  WORDTHEN    THEN 
BEGIN  Nil  la  Nil  ♦  II 
ENn  ELSE  Gn  TO  N*TALT1<J  I 

L?*»TE  STATFMENTTESTC  Nil  )  THEN 
BFGTN  Nil  l«  Nil  ♦  1  I 
ENn  FLSE  GO  TO  NXTALTlO  I 

L?5iBEGTN  BC1?  i«  FALSE  I  R12  la  Nil  I 


NxT«LT2n,Nl2  t>  R12  j 

L26ITF  FTTCHf  N12  )  *  HORDE!  SE    THEN 
BEGIN  N12  la  N12  ♦  II 

ENn  ELSE  Gn  TO  NXTALT21  I 

L27iTF  <TATFMENTTEST(  N12  )  THEN 
BEGIN  N12  is  N12  ♦  1  I 
ENn  ELSE  Gn  TO  N*TALT21  I 

GO  TO  ATCFPTl4  » 

ACCrPTl«mELFTEfRl2,Nl2-Rl2^    I 
«C12i«    TRUE    I 

N*TALT21  iFND  I 

IF  BC12  THEN  BEGIN  Nil  l«  N1 1  ♦  1  I 

ENn  i 
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GO  TO  ArcFPTl?  > 

NXTALT19IN11  »■  Rl 1  I 

L29t!F  VARIA«LETEST(  Nil  )  THEN 
BEftTN  Nil  la  Nil  ♦  1  I 
ENn  ELSF  Gn  TO  N*TALT2?  > 

L3OiBEGTN  BC1?  la  FALSE  I  Rl2  la  Nil  I 

NXTALT23IN12  l«  R12  I 

L31UF  TFTCMfNl?)  ■    13  THEN 

BFGIN  N12  la  N12  ♦  II 
ENn  ELSE  GO  TO  N*TALT24  | 

L32ITF  rrTCH(N12)  «    M  THFN 
BEGIN  N12  la  N12  ♦  II 
ENn  ELSE  GO  TO  MXTALT24  I 

GO  TO  ACCEPT15  I 


NXTALT2A|N12  la  R12  | 
L33HF  FETCHtNl?)  ■    31  THEN 
BFr,!N  N12  | a  N12  ♦  II 
ENn  ELSE  GO  TO  NXTALT2*,  I 
GO  TO  ATCEPT15  I 

ACCFPTl^lDELFTErRl2,Nl2-Rl2)  I 
RC12t«  TRUE  I 

NXTALT2^lFN0  I 

IF  BC12  THEN  BEGIN  Nil  la  Nil  ♦  1  I 

ENn  ELSE  GO  TO  NXTALT2?  I 
L35UF  VALUFTE$T(  Nil  )  THEN 

BEGIN  Nil  la  Nil  ♦  1  J 
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Enh  FLSF  Gn  TO  NXTALT2?  J 

c,n  Tn  ArrrPTi?  » 

NXTALT29|Nll     I"    R11     I 

GAPf     Nil     5     I     Nil     l«    Mil     ♦     1     ) 

I  3<StBE:GTN    BC1?    t«    FALSE    *    R12    »■    N11     t 

NXT«LT27tNl?     »«    R12    I 

IF    MOT    rrTCHf    N12    )    *         «6    THEN    GO    TO    NXTALT?8    I 

GO    TO    ATFPTKS    I 

NXTALT?«tNl?  I*  R12  J 

IP  MOT  rrTCHf  N12  )  «  8?29  THFN  GO  TO  NXTALT29  I 

GO  TO  AfCFPTl6  » 

NXTALT?0|N1?  |B  R12  J 

IP  MPT  rrTCHf  N12  )  •  8967  THFN  GO  TO  NXTALT30  t 

GO  TO  ATFPT16  I 

ACCrPTI  MnELFTEfR12»Ml2-Rl2l    I 

RC12»«    TRllP    I 
NXT«LT3ftlPNO    I 
IP    °C12    THEN    RPGIN    Nil     l»    N11    ♦    1     I 

EN*  ELSE  GO  TO  MXTALT2*  I 
GO  TO  ArcFPTl?  I 


NKTALT2MN11  li  Rl 1  | 
L38HF  FRRORTESTf  Nil  )  THEM 
Bpr'TN  Nil  !■  Nil  ♦  1  | 
ENn  ELSF  GO  TO  NXTAL.T31  i 
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go  to  accepti?  » 

ACCrPTl'iDELFTEfRll.Mll-Rin  I 

RCllla  TRUE  « 
NXTALT31 IFNH  f 
IF  BC11  THEN  BEGIN  NlO  la  N10  ♦  1  I 

EN*  FLSF  GO  TO  N*TA|_T11  J 
GO  TO  ArcFPTlO  I 

ACCrPTl^tnELFTEf R10,N10  -  R10)  |* 
RENAME   tRSfRlOl  |a      5?  I* 
FlNi  STATFMfnTTFST  |a  TRUE  I* 
NXTAI  Til  iFNOl* 
****  ******************************************* 

BOOLEAN  PROCFOURE  LABFLTESTf  RIO  )  I 

VA!  I)F  RIO  I  INTEGER  RIO? 
BEGIN  LARFL  FIN, RENAME  I 

INTEGPR  NIOI 

LABEL  LlO,LHl 

LABEL  ACCFPTIOI 

LABEL  N*TALT10,NXTALT1 II 


IF  FFTCM(RIO)  s 


51  THFN  GO  TO  FIN  I 


NXTALTlOiNlO  l«  RIO  I 
LlOiIF  FFTCH(NIO)  ■     3  THFN 
BEGIN  NlO  la  NlO  ♦  II 
ENn  FLSF  G"  TO  N*TAL.T11  I 
FIR*Tl«R10J  LASTl*N10   -   R10  t% 

FLTYPE  I 


SEMANTTCTFSTI»TYPEnr(Pl )  ■  LAB 
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if  not  <fm*ntIctEst  thFn  go  to  nxtm.tU   i 
go  to  ArerPTto  » 

ACCrPT1ft|OCLPTEfRlO,NlO  •  R10)  II 
RENAME   IRSFRIOI  l«      M  If 
FlNt  LATLTFST  I-  TRUE  II 
NXTM TlliFNDlI 

BOO!  EAN  PROCFOUPF.  VALUETESTf  RIO  )  ) 

VAUIF  RIO  I  INTEGER  R10J 
BEGTN  L4BFL  FIN, RENAME  I 

IWTEGTR  N10I 
BOOLEAN  SFC0NM1) 

LABEL  L10»L11»L12»L'13I 
L«BEL  ACCFPTlOf 

LABEL  NXT»LT10*NXTALT1  II 


IF  FETCM(PlO)  ■ 


60  THEN  GO  TO  FIN  I 


NXTALT10IN10  ••  RIO  I 

LlOilF  4RTTHMFTTCPRTMARYTFST(  NlO  )  THEN 

BEGIN  NlO  t.  NlO  ♦  l  ' 

IF  SFCONOll  TWF*  BFGIN  DEL£TE(n10-3, 3 ) inIO ,.wl0-2  FnO  I 

SErONOU  »■  TRUE  i 


Lll  IWRKI«FETCH(  NlO  )  I 
IF    WRK  ■   16 

OR  WRK  «  44 

OR  NRK  ■  32 

OR  WRK  ■  49 
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THEN   BEGIN  N10  la  N10  ♦  1  I 
QO  TO  HO  END  ENq   FtlSE  NIO  |a  NIO  -  1  J 

IF  MOT  <FCONOH  THEN  GO  TO  NXTALTU  I 
SECONOH  •»  FALSE  J* 
GO  TO  ATCEPTlO  I 

ACCrPTtOtDELFTEfRlO,iglO  •  RIO)  If 
RENAME   IRSTRIOI  !■      «0  j! 

FlNl  VAI.UFTEST  l«  TRUE  II 
NXTAl  Til iFNDlf 
X *********************************** ****** 

BOO!  EAN  PROCEDURE  BOOLE  ANTE«,T{  RlO  )  J 

VA!  |)E  RlO  I  INTEGER  RIOJ 
REGTN  LABFL  FIN.RENAME  I 

IMTEGFR  NIOI 

BOOLEAN  SECONDIH 

LABEL  LlO,Lll#Ll2#ll3f 

LABEL  ACCFPT10J 

LABEL  NXTALT10,NXTALT!il 


IF  FFTCHfRlO)  ■ 


5fl  THFN  GO  TO  FIN  I 


NXTALTlOlNlO  la  RIO  I 

LlOiIF  ROOLEANPRIMARYTESTC  WlO  )    THEN 

BEGIN  NIO  la  NIO  ♦  1  I 

IF  SECONOH  THEN  BFQlN  DELETF f NlOO ,3 ) J NlO | »Nl 0-?  FND  I 

SECOND11  la  TRUE  I 


Lll IWRK  !■  FETCH(  NlO  )  t 
IF    W&K  bWORDANO 
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OR  WRK  ■  WOR00R 

THEN   BFGIN  N10  «■  NlO  ♦  1  I 
GO  TO  L'O  Eno  End   FfsE  NlO  |«  nIO  •  1  | 
IF  WOT  <FCONOH  THEN  GO  TO  NXTALTU  I 

SECOND11  !■  FALSE  It 
GO  TO  ACCFPTIO  f 

ACCrPTlOlOELFTEfRlO.NlO  •  R10)  |f 
RENAME   tRSr RIO  ?  !■      5«  II 
FlNi  BOOLFANTEST  i«  TRUE  IS 
NXTAl Til IFNDI* 

X  *******************  ************************* 

ROOl EAN  PROCEDURE  BOOLE ANPRTMARYTEST (  RIO  )  I 

VA!  IJF  RIO  I  INTEGER  RIOI 
BEGIN  LABEL  FIN.RENAME  I 

INTEGTR  NIOI 

LABEL  LiO#L1l»Ll2»Ll3»L^*»Lt5,Ll6#Ll7l 

L«BEL  ACCFPT10| 

LABEL  NXTALT10#NXTALTll»NVTALT12#NXTAtTl3| 


IF  FFTCM(Rl05  ■ 


86  THEN  GO  TO  FIN  I 


NXTALTlOiNlO  •■  RIO  I 

LlOtlF  TFTCH(NlO)  m  3  TH^N 

BEGIN  NlO  »■  NlO  ♦  1* 
ENH  ELSE  GO  TO  NXT*LTll  J 

FTR<T««R10J  L*ST»«N10   •   RIO  J* 

LEANTYPE  I 
IF  MOT  SFMANTICTEST  THEN  GO  TO  NXTALTH  I 


sEmanttctEsti«typEof(P1)  ■  BOO 
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go  to  acceptio  * 

NKTALTM  INIO  l«  RIO  I 

LlllIF  rrTCH(NlO)  ■    29  THFN 
BEGIN  NIO  la  N10  ♦  II 
ENn  FLSF  QO  TO  N*TALT1?  J 

L12 1  IF  *OOLFANTF$T(  YlO  )  THEN 
BFGTN  NIO  |a  NIO  ♦  1  J 
EN"  fLSE  GH  TO  YXTALT1?  I 

LUiIF  TFTCH(NIO)  ■    45  THFN 
BF«IN  NIO  l«  NIO  ♦  II 
ENn  fLSF  GO  TO  N*TA|_T1?  I 

go  to  accfptio  » 

NXTALTl?lNlO  I*  RIO  I 
LMtTF  VALUFTEST(  N10  )  THEN 

BEGIN  NIO  la  NIO  ♦  1  I 
ENn  PLSF  GO  TO  N*TALT11  I 


L15lWRKi«FETcH(  NIO  )  J 
IF    WRK  ■   61 

OR  WrK  -  60 

OR  WRK  ■  30 

OR  WRK  ■  U 
OR  WRK  ■  47 
OR  WRK  a    15 

THEN   BEGIN  N10  |.  Nl0  ♦  1  j 
ENn  ELSE  GO  TO  NXTALT13  | 
L16ITF  VALUFTESTC  NIO  )  THEN 
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BEGIN  NtO  i.  NlO  ♦  1  | 
ENn  ELSE  GH  TO  NXTALT11  I 

GO  TO  ACCEPTlO  I 

ACCrPTlOIOELFTEfR10,NlO  -  RIO)  I* 
RENAME   IPSTR101  I"      86  IX 
ETNl  BOHLEANPRIMARYTEST  I*  TRUE  I* 
NXTAI T13lEND|* 
X**************** ****************************** 
BOOLEAN  PROCEOUPE  AR I THMET I r PR  I  MAR YTEST (  RIO  )  t 
VALUE  RIO  t  INTEGER  R10J 

BEGTN  LABEL  EIN.RENAME  I 
INTEGER  NIOI 

LAREL  L10»LH»L12»L13»L1*I 

LABEL  ACCrPTlOl 

LABEL  NXTALT10#NXTALT11#NYTAIT1?J 


IE  TFTCM(RlO) 


82  THEN  GO  TO  EIN  I 


NXTALTlOiNlO  I"  BIO  I 

LlOlIE  rrTCH(NlO)  ■    29  THEN 

BEGIN  NlO  la  NlO  ♦  II 
ENn  ELSE  GO  TO  NXTAlTll  I 

LlltlE  VAIUETEST(  NlO  )  THEN 
BEr,IN  NlO  i«  NlO  ♦  1  I 
ENn  ELSE  Gn  TO  nVTALTH  I 

L12IIE  rrTCH(NlO)  ■    «5  THEN 
BEGIN  NlO  I-  NlO  ♦  II 
ENn  ELSE  60  TO  NXTALT11  I 

GO  TO  ACCEPT10  | 


65 


NXTALT11 iNlO  I*  RIO  I 
L13ITF  "ARlABLETESTf  NIO  )  THEN 
BEGIN  NlO  t«  NIO  ♦  1  I 
ENn  ELSE  Gn  TO  NXyALTl?  I 
GO  TO  ACCEPTlO  I 

ACCrPTlrttnELETEfRlO,MlO  •  R10)  |* 
RENAME   iPSrRlOT  l«      6?     II 
ETNt  ARTTHMETICPRIMARYTTST  l»  TRUE  II 
NXTAt  T12iEN0lI 

g******«******************************* 

ROni EAN  PROCEDURE  VA RI ARLETTST(  RIO  )  I 

VA|  tjE  RIO  I  INTEGER  RlOl 
BEGTN  LAREL  EIN.RENAME  I 
IMTERPR  NIOI 

LAREL  LlO.Lll I 

LABEL  ACCEPTIOI 

L«BEL    NXTA|.T10*NXTAl  T1  i  J 


IE    FETCH(RlO) 


77    THEN    GO    TO    EIN    I 


NXTALTintNlO    t.    RIO    I 
LiOtlF    TETCH(NlO)    «  3    THPN 

BEGIN    NlO    t«    NIO    ♦    H 

£ht\  else  go  to  nxtaltm   I 

ETR^Tt.PlO*    L*STI«N10       -      R10    J* 

EGERTYPE     » 
IE    MOT    *FMANTICTEST    THEN    GO    TO    NXTA|_TU     | 


SEMANTTCT^'STl-TYpE'OrfPM    ■    INT 
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go  to  ArrrPito  » 

ACCFPTlOlDELFTEfRlO.NlO  -  R10)  J* 
RENAME   iRSrRlOl  I"      T7     t% 

TIN!  VapTABlTTEST  la  TRUE  I* 
NXTAl Til iFNDl* 

f ******************************************* 

BOO!  EAN  PROCfOU»E  ERRORTESTf  RIO  )  J 

VAI  OF  RIO  I  INTEGER  Rlf») 
BEGTN  LAREL  FlN»RENAME  » 
IWTE6TR  NlOtNl 1  I 

IWTEGTR  Rll I 
BnntFAN  BC1 1) 

L«REL  I.10»L11»L12> 

LABEL  ACCFPT10,ACCEPT11I 

L«BEL  NXTALTin,NXTALTll*NYTALT12#NXTALTl3> 


TE  FFTCWfRlO) 


63  THEN  GO  TO  FIN  J 


NXTALTiniNlO  »«  RIO  I 

WRK  la  FFTCHf  N10  )|^10  |>  mIO  ♦  1| 

LlOlBEGTN  BC11  «»  FALSE  I  Rl 1  «■  NIO  I 


NXTAl Tl'tNll   ta  Rll  I 

WRK  la  FETCHf  N1  1  )IMll  |a  wll  ♦  \f 

IF  FFTCMfNll-1)  »    *6   THEN  GO  TO  NXTALT13  I 

IF  FFTCM(Nll-l)  ■  WOROEMO     THFN  GO  TO  NXTALT13  I 
GO  TO  ACCFPT11  I 
ACCFPT11  lOELFTEfRU#Ntl-Rin  I 
«Cllia  TRUF  I 
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NXT4LT11IFND  I 

IF  «C1!  THEN  BEfilN  N10  la  N10  ♦  1  I 

DPI FTEfNtO-2#2)  J  NlOl.NlO  -  1  I  GO  TO  Lll  ENO  f 
60  TO  ACCEPT10  t 

ACCrPT10IDELFTFfR10,NlO  •  R10)  I* 
RENAME   iRSfRlOT  l«      63  f| 
FINi  ERPORTEST  la  TRUE  I* 
NXTA!  Til  lENDl* 
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APPENDIX  D 
B-5500  IMPLEMENTATION 

Introduction 

TWST  is  the  prefix  of  all  files  of  the  system  in  the  B-5500 
implementation.  The  three  files  required  to  translate  a  grammar 
(including  semantics)  are  TWST/ HIGH,  TWST/ SKELETON,  and  TWST/COMPILE. 
The  first  of  these  is  used  merely  to  separate  syntax  and  semantics 
and  may  be  bypassed  if  desired. 

Global  Description 

Syntax  and  semantics  may  be  written  on  one  file  called 
SOURCE,  which  is  partially  separated  by  a  program  TWST/HIGH  into 
syntax  and  semantics.   The  syntax  is  passed  through  TWST/ COMPILE 
which  translates  the  syntax  into  a  collection  of  ALGOL  procedures. 
The  semantics  is  then  merged  in  and  the  whole  happy  affair  passes 
to  the  ALGOL  compiler. 


input  file 
SOURCE 


TWST/ 


syntax  and  semantics  on  one  file 
syntax 


HIGH 


semantics 


TWST/ 


COMPILE 


TWST/ SKELETON 


<name>/SOURCE 


ALGOL 


COMPILES 
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The  syntax  and  semantics  are  written  altogether  in  a  single 
file  called  "SOURCE".   The  program  TWST/ COMPILE  takes  this  input  and 
converts  the  syntax  parts  into  Burroughs  ALGOL.   This  is  then  merged 
with  the  file  <T¥ST>/ SKELETON  and  <name>/SEMANTICS  to  form  <name>/ 
SOURCE,  which  is  the  ALGOL  code  for  the  compiler.  The  file 
TWST/SKELETON  contains  a  scanner,  a  control  card  procedure,  error 
recovery  procedure  and  several  other  procedures  required  by  the 
system.   The  file  <name>/SOURCE  is  resequenced  by  100' s  after  merging. 

The  Syntax  Language 

The  syntax  language  is  very  similar  to  BNF.   However, 
various  abbreviations  have  been  made  both  for  user's  convenience  and 
to  improve  the  efficiency  of  the  resulting  ALGOL  code.   Specifically, 
the  following  extensions  have  been  made: 

(i)    Kleene  star:  A  star  (*)  following  a  symbol  means  any  number 
of  repetitions  of  that  symbol: 

<A>*  =  <  >  |  <A>  |  <A>  <A>  | 

(ii)   Brooker  and  Morris's  question  mark  to  mean  the  optional 
presence  of  some  symbol: 

<A>  ?  =  <A>  j  <  > 

(note  that  &  is  regarded  as  equivalent  to  ?) 

(iii)   List  <A>:   is  merely  a  different  notation  for: 
<A>  <A>* 


TO 


(iv)   List  <A>  separator  <B>  =  <A>  [<A>  <B>]* 

There  are  certain  conventions  which  apply  to  list  and  * 
operations:   these  are  discussed  in  the  section  on  implementation. 

(v)    Brackets  [   ]:   delimit  groups  of  symbols--(iv)  above  is 
an  example.  The  production: 


<X> 

is  entirely  equivalent  to 
<X>   : 
<dummy>   : 


<Y>  [  <A>  <B>  <C>  ]  <Z> 


=  <Y>  <dummy>  <Z> 
=  <A>  <3>  <C> 

Naturally  brackets  can  be  nested  to  any  depth--behold  the 
following  compact  production  for  a  Boolean  expression: 
<Boolean  expressions>  :  :  = 
list  [ 

list  <Boolean  primary>  separator  [A  |  and]  ] 
separator  [V  |  or  ] 

(vi)   <any>  =  any  symbol  whatever 

(vii)  but  t:   this  is  normally  used  in  conjunction  with  <any>  to 
express  things  like 

<comment>  :  :  =  comment  [<any>  but#;]* 

(viii)  not  t}    ahead  t 

These  are  one  symbol  lookahead  instructions  which  check 
for  the  absence  or  presence  of  the  terminal  t.  Any  number  of 
not' s  may  be  used  but  only  one  ahead.  Their  use  is  largely  in  code 
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optimization  (of  object  code  produced)  and  to  a  more  limited  extent 
to  fudge  EXEC  calls.   Thus: 

<alternative>  :  :  =  not  #;  @  S23  list  -  -  -  ; 
It  is  important  that  action  23  not  "be  called  if  an 
<alternative>  is  not  present  so  the  not  #;   prevents  it  being  called 
in  this  case. 

(ix)   back 

This  is  a  one  symbol  lookback  which  is  provided  but  appears 
never  to  have  been  used. 

(x)    Calls  on  semantic  actions. 

These  are  of  two  types,  both  preceded  by  @  immediately 
followed  by  "S",  "T",  or  "Q".   Semantic  actions  are  the  subject  of 
a  separate  section. 

Syntax  Rules 

There  are  certain  syntax  rules  of  the  BNF  input  which  are 
provided  mainly  to  avoid  ambiguous  productions,  but  also  to  protect 
the  user  to  some  extent  (e.g.,  separator  occuring  by  itself  could 
be  regarded  as  a  terminal  of  the  language  defined-- it  seems  more 
reasonable  to  prohibit  its  use  in  this  way) . 

(i)    <,   >,   [ ,  ],  ;,  |,  &,  list,  *,  separator,  not,  open,  close, 
but,  ahead,  any,  @,  #  may  not  be  used  in  the  grammar  without  being 
preceded  by  #--hence  #<,  #>,  #g,  ##. 

(ii)   each  production  must  be  terminated  by  ', 
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(iii)   separator  may  not  be  used  without  a  preceding  list 

(iv)   exits  to  semantic  routines  are  designated  by  @.  This  will  be 
discussed  later  as  will  the  words  open  and  close  which  also  have 
special  significance. 

(v)    only  identifiers  and  spaces  may  appear  between  "<"  and  ">" 

(vi)   <  >  must  always  be  used  to  represent  the  null  symbol. 


Alpha  Procedure  SCAN; 

SCANMODE  =  0      normal  operating  mode-identifiers  are  recognized 

and  entered  in  a  table  called  BIGTAB  :   the  address 
in  BIGTAB  is  returned.  Numbers  are  recognized  and 
assembled  and  the  address  in  BIGTAB  is  returned. 


SCANMODE  =  1 


SCANMODE  =  2 


SCANMODE  -  3 


SCANMODE  =  h 


SCANMODE  =  5 


same  as  0  except  blanks  or  repeated  blanks  are 
also  recognized  separately  as  a  single  blank. 

returns  each  character  except  blanks  which  are 
ignored. 

same  as  0  except  multiple  blanks  are  reduced  to 
single  blanks. 

same  as  0  except  identifiers,  etc.  are  not  entered 
in  BIGTAB. 

return  every  character- 
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There  are  a  number  of  Boolean  variables  which  also  control 
the  action  of  the  scanner  and  if  set  to  true  cause  the  following: 

LETTERCHAR       no  assembly  of  identifiers;  i.e.,  each  letter  is 

returned  separately. 

DIGITCHAR        each  character  of  a  number  is  returned  separately 

with  no  formation  of  the  number. 

STRINGCHAR       no  assembly  of  strings. 

IDBLANKS         ignore  blanks  in  an  identifier. 

Each  of  the  above  is  normally  set  false. 

FRSTCOL  the  first  column  of  each  card  to  be  read, 

normally  =  1 

LASTCOL  the  last  column  read  on  each  card,  normally  =  72. 

The  scanner  operates  with  reference  to  a  table 
called  CHARCLAS-   Thus  to  set  "-"  as  an  alphabetic 
character 

CHARCLAS  ["-"]•   ALPHABETIC   :  =  1  ', 
Similar  assignments  may  be  made  for  ALPHANUMERIC, 
etc. 

References  to  output  of  the  scanner  are  by  way  of  the 
special  nonterminals: 

<*I>  meaning  identifier 

<*N>  meaning  integer 

<*R>  meaning  real 

<*S>  meaning  string 
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Executive  Calls  SEMANTICS 

For  historical  reasons  two  separate  types  of  semantic 
routines  are  accepted.  The  first  is  referred  to  by  number: 

SEMANTIC     (N,  FIRST,  LAST); 
value         N,  FIRST,  LAST; 
integer       N,  FIRST,  LAST; 

This  will  presumably  be  a  procedure  which  switches  on  the 
parameter  N.   Using  FIRST  to  point  to  the  first  element  of  the  pro- 
duction, and  LAST  to  the  final  element.   It  is  called  from  the  syntax 
by  one  of  three  entries: 

@  S  <integer>   a  call  on  the  semantic  routine  number 

<integer>,  i.e.,  SEMANTIC  (<integer>,  -  -  -) 

@  T  <integer>   the  same  as  @  S,  except  that  it  requires  a 

Boolean  variable  SEMANTICTEST  to  be  set  true 
in  order  for  syntactic  analysis  to  proceed. 

The  second  type  of  call  is  an  inline  semantic  routine: 

<inline  semantic  routine>  :  :  = 
ti©   [  S  |  T  |  Q  ] 

#[<algol  part>  #]  ; 
When  the  syntax  is  translated  into  Burroughs  ALGOL,  the 
inline  semantic  routines  are  inserted  in  the  code  produced. 
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Within  a  semantic  routine,  the  various  parts  of  a  produc- 
tion may  be  referred  to  in  two  ways: 

(i)    communication  array   COMM[ 1:100] 

FIRST  points  to  the  beginning  of  the  production  in  COMM, 
LAST  points  to  the  end  of  it. 

(ii)   PM3,  PM2,  PM1,  PO,  PI,  P2,  P3,  P9 

define  PM3  =  COMM  [FIRST  -  h]  #, 
PO  =  COMM  [FIRST  -  1]  #, 
PI  =  COMM  [FIRST    ]  #, 


thus 


P9  =  COMM  [FIRST  +  8]  #; 


<statement>  :  :  = 

if  <Boolean>  then  <statement>  [else  statement]? 

MM    ft 

PI     P2     P3      P^       PO     PI 

P5-*1 ' 


Note  that  within  [   ]  the  nomenclature  changes—bear  in 
mind  the  equivalence  of  [   ]  and  dummy  productions. 

Examples:  ■ 

<statements>  :  :  =  [<label>  :]  * 

[if  <Boolean>  then  <statements>  [<eLse  <statements>]  ? 
do  <statements>  until  <Boolean>  | 
while  <Boolean>  do  @S2  <statements>  @S3  |   <rest>  ]  ; 
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ACTION  2:        mark  instruction  stream;  go  to  fin; 

ACTION  3:        compile  loop  instruction;  go  to  fin; 

procedure  end  of  block;  ; 
<rest>       :  :  = 

begin  list  <declaration> 

list  <statement>  separator  #; 
@T  [end  of  block;]   end; 

<assignment  statement>  :  :  = 
list  [<variable>  [«-|  :  =]  ©sU] 

<arithmetic  expression>  @S6  @T5; 

Comment:   Note  that  the  order  of  execution  of  execution 
calls  is:  h,     k,    .  .  .  k,     6,      5* 

<program>  :  :  =  @S  [initialize;]  <blocks>  ; 
<dummy  statement s>  :  :  =  @S7; 


ACTION  7: 


ACTION  8 


go  to  fin: 

<alternative  dummy  statement> 
:  :  =  <  >  ; 

<dummy  statement  and  semicolon> 
::=<>#;    @S8; 

PI  =  contents  of  dummy  statement; 
P2  =  semicolon; 

<arithmetic  expression>  :  :  = 

list  <term>  separator  [  +  |  -  ]  @S9; 
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Comment:   Note  that  when  Action  9  is  called,  the  communication 
array  will  have: 

<term>  +  <term>  +  <term>  +  <term> 
<label>  :  :  =  <name>  @T10: 

ACTION  10:        lookup  PI  to  see  if  it  has  been  correctly  declared; 

set  SEMANTICTEST  accordingly; 

go  to  fin; 

Implementation 

The  implementation  is  a  straight  forward  top  to  bottom 
recursive  descent  method.  The  reader  is  referred  to  appropriate 
chapters  of  this  document  for  a  full  description.  The  following 
should  be  noted: 

(i)    A  "reduction"  can  never  be  undone.  A  reduction  of  some 

group  of  non-terminals  or  terminals  is  performed  by  striking 
off  the  tail  of  the  production.  Thus  one  "do  <statement> 
until  <Boolean>"  has  been  reduced  to  <statement>  by  discard- 
ing "<statement>  until  <Boolean>"  the  information  in  PI,  P2, 
P3  is  lost. 

PI  is  preserved  and  can  be  used  to  store  information  without 
it  being  wiped  out. 

(ii)   Interpretation  of  a  production  proceeds  from  right  to  left: 
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thus 

<statement>  :  :  = 

if  <Boolean>  then  <statement> 

if  <Boolean>  then  <statement>  else; 

is  almost  bound  to  fail  because  the  first  alternative 
of  the  production  shadows  the  second. 

On  the  other  hand,  this  left  to  right  scan  can  be  used  to 
advantage  at  times;  e.g., 

if  <Boolean>  then  <statement> 

[ else  <statement>  ]? 
will  automatically  pair  the  inner  then  and  else  in  the 
manner  already  used  by  Burroughs. 

Empty  Productions 

Users  should  be  cautioned  about  using  empty  productions  in 
their  grammar.  Although  they  will  work, 

<A>  :  :   =  <  >  ; 

will  always  be  found. 

Note  this  example: 

<statement>  :  :  = 

begin  list  <statement>  separator  #;  end  | 

begin  list  <declaration> 

list  <statement>  separator  #;  end  | 

<  >  ; 
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Because  <statement>  ::=----<  >it  will  always  be 
found  when  sought.  Thus  the  system  will  erroneously  find  a  <statement> 
between  the  begin  and  declaration  in  the  string: 

"begin  system  forces  in  <statement>  real"; 

There  are  two  types  of  list  as  has  been  mentioned  before. 
They  are  the  open  and  closed  types.   In  an  open  list  every  member 
of  the  list  is  present  in  the  communication  array: 

<term>  +  <term>  +  <term>  +  <term> 

In  a  closed  list  each  succeeding  +  <term>  is  deleted  as 
formed. 

Open  Kleene  stars  are  similar  and  may  take  0,  1,  2,  3;  •  •  •) 
locations  of  COMM.   Closed  Kleene  stars  occupy  zero  positions  always. 

An  open  list  is  indicated  by  writing  open  after  the  list | 
list  separator  construction  and  closed  by  the  word  closed.   The 
default  options  are  action  call  immediately  following  -*  open  other- 
wise closed. 

<A>  *  @S3  open 

<A>  *  <B>  closed 

<A>  *  open  open 

<A>  *  closed  closed 

Note;  that  in  counting  the  number  of  elements  in  a  produc- 
tion, <A>  ?  will  count  as  one  only  if  it  is  in  fact  present: 

<A>  <B>  <C>  ?  <D> 

3  or  h   elements. 
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Each  procedure  so  defined  has  an  identification  number;  e.g., 

IDNOARRAYIDENTIFIER 

as  does  each  reserved  word  of  the  language.   Every  <word>  in  the  pro- 
gram is  taken  as  reserved  and  may  be  referenced  by  W0RD<word>;  e.g., 

WORDBEGIN 
WORDEND 

Interaction  With  Semantics 

The  system  is  at  least  as  flexible  as  ALGOL  because  it 
contains  ALGOL  as  a  subset.   It  is  possible  to  initiate  the  search 
for  a  particular  nonterminal  <arithmetic  expression^  say,  by  calling 
the  Boolean  procedure: 

TESTAIRTHMET ICEXPRES S ION ( N ) 

having  one  parameter  to  indicate  where  in  the  communication  array 
on  <arithmetic  expression>  should  be  sought: 
Note  this  example: 

<procedure  parameter  part>  :  :  =  (@S  [seek  parameters]); 

Where  procedure  "seek  parameters"  looks  up  the  parameter  in  a  given 
position  and  calls  the  procedures: 

TESTARRAYIDENTIFIER( FIRST  +  l) 
or  TESTPROCEDUREIDENTIFIER( FIRST  +  l),   etc. 
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Error  Recovery 

The  user  is  expected  to  insert  his  own  error  recovery  in 
the  syntax. 

Thus: 

declarations  :  :  =  integer  [list  <*I>  separator,  |  <error>] ; 
<error>      :  :  =  <any>  [<any>  but  #;]*  ; 

Control  Cards 

The  control  cards  for  the  ALGOL  compiler  (which  compiles 
<name>  |  SOURCE)  and  TWST/COMPILE  should  be  on  separate  cards. 
Amongst  others  there  are: 


$  LIST 
$  NOLIST 

$  TRACE2  ) 

) 

or  DEBUG   ) 


$  TRACE3  ) 

) 
or  $  EXEC   ) 

$  TRACE9 

$  RESERVE 


causes  the  syntax  to  be  listed  as  processed. 
inhibits  listing—listing  is  assumed, 
must  be  turned  on  at  generation  and  running  of 
the  compiler  to  cause  the  resulting  compiler  to 
list  what  it  is  seeking  and  when  it  finds  par- 
ticular nonterminals, 
as  above— causes  exits 
to  semantics  to  be  printed, 
lists  scanner  output. 

will  set  the  reserve  word  option  as  the  presumed 
setting  in  the  compiler.   It  is  already  assumed-- 
the  alternative  is  the  special  word  designation. 
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$  SPECIAL  <any  character>  to  set  a  given  symbol  as  special 
■word  designator;  e.g.,  SPECIAL  means  every  word 
of  the  language  which  is  reserved  must  be  marked 
by  a  period  (full  stop). 

Note  that  there  are  no  facilities  at  present  for  making  only 
some  words  reserved,  whereas  the  syntax  can  often  be  designed  to  cir- 
cumvent the  need  for  reserved  words. 

In  order  for  TRACE2  or  TRACE3  to  operate  they  must  be  turned 
on  at  both  compiler  building  and  run  time. 

In  this  case  the  complication  may  be  circumvented  by  reorder- 
ing the  production: 

<statement>  :  :  = 

begin  list  <declaration> 

list  <statement>  separator  #;   end  j 

begin  list  <statement>  separator  #;   end  |  <  >  ; 

Many  of  the  traps  associated  with  <  >  can  be  avoided  by 
using  *  or  ? . 

Assembly  Time  and  Execution  Time 

The  syntax  can  be  processed  at  about  60  cards/minute  for 
large  grammars.  Most  of  this  time  is  spent  in  formatting  and  it  is 
hoped  to  improve  this  figure. 

The  resulting  compilers  should  operate  at  a  minimum  of 
1000  cards/minute  parsing  time.   Cleverly  written  grammars  can 
almost  double  the  parsing  speed. 
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